IMHO, this is one of the coolest aspects of Clojure: the data structures were designed to be immutable as efficiently as possible. This allows for some of the most critical aspects of the language to function. If you're whole program passes around values that are immutable, you don't ever have to worry about which part of the code owns what. Rust is such a different language but the borrow-checker is solving a nearly identical problem: who owns this memory and who is allowed to change it? In Rust, one owner ever gets to write to a piece of memory and you can move that around. It catches lots of bugs and is extremely efficient, but it is definitely a difficult aspect of the language to master. Clojure just says everyone gets to read and share everything, because every mutation is a copy. One piece of your codebase can never stomp on or free some memory that another part is using. Big tradeoffs with garbage collection but geeze does it make reasoning about Clojure programs SO much easier. And the cornerstone of that idea is fast, immutable data structures by default. Maps are implemented similarly in that they are fast to make mutated copies with structural sharing to save memory.
It's surprising to me how well these go together, the transient concept in clojure is essentially a &mut, couple that with a reference counter check and you get fast transient mutations with cheap persistent clone.
All of rusts persistent immutable datastructure libraries like im make use of this for drastically more efficient operations without loss in capability or programming style.
I used the same principle for my rust re-imagination of datascript [1], and the borrow checker together with some merkle dag magic allows for some really interesting optimisations, like set operations with almost no additional overhead.
Which allows me to do stuff like not have insert be the primary way you get data into a database, you simply create database fragments and union them:
let herbert = ufoid();
let dune = ufoid();
let mut library = TribleSet::new();
library += entity! { &herbert @
literature::firstname: "Frank",
literature::lastname: "Herbert",
};
library += entity! { &dune @
literature::title: "Dune",
literature::author: &herbert,
literature::quote: ws.put(
"I must not fear. Fear is the mind-killer."
),
};
ws.commit(library, "import dune");
The entity! macro itself just creates a TribleSet and += is just union.
In Clojure this would have been too expensive because you would have to make a copy of the tries every time, and not be able to reuse the trie nodes due to borrow checking their unique ownership.
Most languages have some sort of peristent/immutable data library available. But that's not the same as having it built in! If you have to weave 3rd party types through the entire application and convert on behalf of libraries that aren't compatible. There is an enormous benefit to having these data structures as first class citizens.
And thank you for highlighting the conceptual link to Rust's borrow checker. Clojure and Rust are far and away my two favorite languages and this largely articulates why: they both have language-level constructs to completely solve the data ownership problem. Not just "here's some tools to optionally solve it, good luck" but truly, fully solve it. Full stop.
Almost every other language leaves ownership enforcement up to the developer. Clojure and Rust take entirely different approaches to virtually everything, but they converge on this one point and it's a crucial one. Types, syntax, compilation models, garbage collection, etc. are all secondary concerns to managed data ownership IMO.
It is faster than clojures vectors (running on the JVM, so apples and cucumbers) in all cases, and mostly beats scala's vectors except for splitting which is crazy fast in scala's vectors).
I know people using ref counting to support using allocation arenas for immutable structures. For some workloads that gives a pretty crazy performance boost.
Just pre-allocating leaf nodes can reduce iteration overhead by 40%.
And Hickey himself said he adapted ideas from Bagwell's HAMTs. And tries are 60 years old.
I have always thought Hickeys main contribution was making it default in a coherent way, and proved it could be done. Before clojure most peoplle still thought immutable data structures were too I practical.
I think the Clojure version does have some actual improvements over the Bagwell version, and some implementation tricks improvements as well. But I don't remember all the details.
Well, sure. But it is not like Hickey invented the 5bit partitioned trie (there is work in sml and Haskell before that), nor did he invent functional tries.
He took what was a research topic and made it standard. There were no other 5bit partitioned tries in (wide) use. I think he did that in a way that signals a fantastic sense of taste, and if you are implementing a programming language you need taste.
Yeah, “Purely Functional Data Structures” came out in 1996! I read most of it recently, when I needed a C++ copy-on-write hash map. It was still fairly relevant (although the “obvious” solution of a HAMT was also the one we decided on).
reply