If you want real immutable data structures, not a cheap imitation, check out pyrsistent.
If you want real immutable data structures, not a cheap imitation, check out pyrsistent.
Look up persistent data structures [0]
The main trick is to store everything in a tree with a large branching factor.
The elements are on your leaves. When you want to make an edit you need to create only the nodes from the root to the actual leaf. Then you return the new root. Everything else you can share.
This is a log operation, normally implemented with branch 32.
It works with vectors as well as dicts. Creating a copy is constant, it's immutable can be shared freely (especially on GC'd languages).
Insert/Delete are O(log32(n)) instead of O(n) they'd be with a full copy.
[0] https://en.wikipedia.org/wiki/Persistent_data_structure#Pers...
You can do the same thing with binary trees and other structures. It's more fiddly, and there's definitely places where single-ownership allows mutability "under the hood" for real performance gains, but that's the basic idea.
You think Python developers are going to roll their own HAMT on top of frozendicts? Or are they just gonna make copies? Personally, I'd just use pyrsistent which seems to get it right.
- Creation - 8-12x slower
- Lookup - 22-27x slower
- Contains check - 30-34x slower
- Iteration - 5-14x slower
- Merge - 32-158x slower
Except at 10k+ items, batchup dates on 100K+ items or inserting 100 keys.This is rarely the case in practice, most dictionaries and dict operations are small, if you have a huge dict, you probably should be chunking your load or delegating that to infra.
Not to mention pyrsistent's API is incompatible with dicts, so you can't pass it to external code without conversion.
You'd better have an incredible ROI to justify that.
Since when is Python about speed?
> Just ran a quick benchmark
Where's the code? Have you observed the bottleneck call?
> Except at 10k+ items, batchup dates on 100K+ items or inserting 100 keys.
> This is rarely the case in practice
Where's the stats on the actual practice?
> You'd better have an incredible ROI to justify that.
The ROI being: fearless API design where 1) multiple instances of high level components are truly independent and could easily parallelize, 2) calling sites know that they keep the original data intact and that callees behave within the immutability constraints, 3) default func inputs and global scope objects are immutable without having to implement another PEP, 4) collections are hashable in general.
I won't waste more of your time.
Functional data structures essentially create a proxy on every write. This can be inefficient if you make writes in batches, and you only need immutability between batches.
Python already has an HAMT implementation in use by the contextvars module.