Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Understanding Clojure's Persistent Vectors, pt. 1 (2013) (hypirion.com)
107 points by mirzap 16 hours ago | hide | past | favorite | 27 comments
 help



IMHO, this is one of the coolest aspects of Clojure: the data structures were designed to be immutable as efficiently as possible. This allows for some of the most critical aspects of the language to function. If you're whole program passes around values that are immutable, you don't ever have to worry about which part of the code owns what. Rust is such a different language but the borrow-checker is solving a nearly identical problem: who owns this memory and who is allowed to change it? In Rust, one owner ever gets to write to a piece of memory and you can move that around. It catches lots of bugs and is extremely efficient, but it is definitely a difficult aspect of the language to master. Clojure just says everyone gets to read and share everything, because every mutation is a copy. One piece of your codebase can never stomp on or free some memory that another part is using. Big tradeoffs with garbage collection but geeze does it make reasoning about Clojure programs SO much easier. And the cornerstone of that idea is fast, immutable data structures by default. Maps are implemented similarly in that they are fast to make mutated copies with structural sharing to save memory.

It's surprising to me how well these go together, the transient concept in clojure is essentially a &mut, couple that with a reference counter check and you get fast transient mutations with cheap persistent clone.

All of rusts persistent immutable datastructure libraries like im make use of this for drastically more efficient operations without loss in capability or programming style.

I used the same principle for my rust re-imagination of datascript [1], and the borrow checker together with some merkle dag magic allows for some really interesting optimisations, like set operations with almost no additional overhead.

Which allows me to do stuff like not have insert be the primary way you get data into a database, you simply create database fragments and union them:

  let herbert = ufoid();
  let dune = ufoid();
  let mut library = TribleSet::new();

  library += entity! { &herbert @
        literature::firstname: "Frank",
        literature::lastname: "Herbert",
    };

  library += entity! { &dune @
        literature::title: "Dune",
        literature::author: &herbert,
        literature::quote: ws.put(
            "I must not fear. Fear is the mind-killer."
        ),
    };

  ws.commit(library, "import dune");
The entity! macro itself just creates a TribleSet and += is just union.

In Clojure this would have been too expensive because you would have to make a copy of the tries every time, and not be able to reuse the trie nodes due to borrow checking their unique ownership.

1: https://github.com/triblespace/triblespace-rs


Most languages have some sort of peristent/immutable data library available. But that's not the same as having it built in! If you have to weave 3rd party types through the entire application and convert on behalf of libraries that aren't compatible. There is an enormous benefit to having these data structures as first class citizens.

And thank you for highlighting the conceptual link to Rust's borrow checker. Clojure and Rust are far and away my two favorite languages and this largely articulates why: they both have language-level constructs to completely solve the data ownership problem. Not just "here's some tools to optionally solve it, good luck" but truly, fully solve it. Full stop.

Almost every other language leaves ownership enforcement up to the developer. Clojure and Rust take entirely different approaches to virtually everything, but they converge on this one point and it's a crucial one. Types, syntax, compilation models, garbage collection, etc. are all secondary concerns to managed data ownership IMO.


There are also atomic references when mutability is the best approach. This is forcing single writer but in the API rather than as a proof.

He went on to implement https://github.com/hypirion/c-rrb Which are just like clojures vectors but has fast insertions/deletes and merges.

I semi-ported it to c# here: https://github.com/bjoli/RrbList/tree/main/src/Collections

It is faster than clojures vectors (running on the JVM, so apples and cucumbers) in all cases, and mostly beats scala's vectors except for splitting which is crazy fast in scala's vectors).


Oh god, I remember, I tried to implemented this in Dylan once long ago. I didn't get very far but I really liked the data-structure:

https://github.com/nickik/RRB-Vector-in-Dylan/blob/master/RR...


I tried in scheme first, but failed miserably. Doing it in c# was easier since I could more directly compare code.

Dylan wasn't the issue I failed, I found it nice to work with.

If you want to go further, the HAMT designed by Bagwell was apparently improved upon by Steindorfer to give CHAMP. See this FSet MR and the paper link within for more: https://gitlab.common-lisp.net/fset/fset/-/merge_requests/1

Would be cool to have recent findings on persistent data structures via the lenses of cache locality

I know people using ref counting to support using allocation arenas for immutable structures. For some workloads that gives a pretty crazy performance boost.

Just pre-allocating leaf nodes can reduce iteration overhead by 40%.


Scala has a similar data structure https://www.scala-lang.org/api/2.13.6/scala/collection/immut.... Something was in the air in 2013.

Clarifying Clojure had them in 2007. Scala implementations were inspired by Clojure’s. Both Odersky and Bagwell have given credit to Hickey.

And Hickey himself said he adapted ideas from Bagwell's HAMTs. And tries are 60 years old.

I have always thought Hickeys main contribution was making it default in a coherent way, and proved it could be done. Before clojure most peoplle still thought immutable data structures were too I practical.


That's a big contribution, also the original HAMTs are not a functional data structure. See Section 3.4.1 in https://docdrop.org/download_annotation_doc/3386321-trk2f.pd...

No, but persistent bit partitioned tries were pretty well known in the late 90s (I first met them in standard ML in 2005)

I think the Clojure version does have some actual improvements over the Bagwell version, and some implementation tricks improvements as well. But I don't remember all the details.

Well, sure. But it is not like Hickey invented the 5bit partitioned trie (there is work in sml and Haskell before that), nor did he invent functional tries.

He took what was a research topic and made it standard. There were no other 5bit partitioned tries in (wide) use. I think he did that in a way that signals a fantastic sense of taste, and if you are implementing a programming language you need taste.


Real World Haskell was published in 2008, followed by Real World OCaml in 2013.

Scala got introduced in 2004, with the first Programming in Scala book, in 2008.

HN had plenty of PureScript and Elm.

FP finally was going to get their spotlight, and then mainstream languages got enough FP features to keep being the go to tooling.


Those are not really the same. Those are N=32 finger trees which have extra benefits (quick slices, for example, quicker insertions).

It’s not a secret that they came from the same zeitgeist but took wildly different approaches, but inspired each other.

I still don’t understand why they’re referred to as persistent vectors rather than immutable vectors, but I digress.


I still don’t understand why they’re referred to as persistent vectors rather than immutable vectors, but I digress.

I believe that immutable just means, well, immutable, but persistent means that updates are achieved via structural sharing, so they’re efficient.


structural sharing = log n updates

if you think immutable updates are O(n) in 2026, you're so far behind the curve it's laughable

it's crazy how many ppl i interview just stop thinking and insist you can't do better than O(n)


Can you please share these data-structures you are talking about? What papers?

Because standard libraries in mainstream languages are name-squatting on 'immutable' pretty hard.

You wanted 2+1 to yield 3, but instead you get a runtime exception telling you that 2 can't be changed.


okasaki came first

haskell too had them (IntMap honestly works fine in that use case)


Yeah, “Purely Functional Data Structures” came out in 1996! I read most of it recently, when I needed a C++ copy-on-write hash map. It was still fairly relevant (although the “obvious” solution of a HAMT was also the one we decided on).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: