A “frozen” dictionary for Python

1. perrygeo ◴[11 Dec 25 15:02 UTC] No.46232229[source]▶

Python discovers immutable data, and gets it wrong. Frozendict is a blunt instrument - instead of a mutable free-for-all, it just locks the data for the entire lifecycle. Brace for the wave of code littered with deep copies, proudly proclaiming how functional it all is.

If you want real immutable data structures, not a cheap imitation, check out pyrsistent.

replies(2): >>46232356 #>>46232421 #

2. shadowgovt ◴[11 Dec 25 15:12 UTC] No.46232356[source]▶

>>46232229 (TP) #

If your dicts are frozen, you shouldn't need to deep-copy. The point of immutability is that if you want a new frozendict based on another one, you just rebuild the indirection data structure up top and leave the values it references alone.

replies(1): >>46233892 #

3. cylemons ◴[11 Dec 25 15:17 UTC] No.46232421[source]▶

>>46232229 (TP) #

How else would you "modify" immutable data other than by copying it?

replies(3): >>46232514 #>>46232538 #>>46238746 #

4. ndr ◴[11 Dec 25 15:24 UTC] No.46232514[source]▶

>>46232421 #

You don't have to copy everything.

Look up persistent data structures [0]

The main trick is to store everything in a tree with a large branching factor.

The elements are on your leaves. When you want to make an edit you need to create only the nodes from the root to the actual leaf. Then you return the new root. Everything else you can share.

This is a log operation, normally implemented with branch 32.

It works with vectors as well as dicts. Creating a copy is constant, it's immutable can be shared freely (especially on GC'd languages).

Insert/Delete are O(log32(n)) instead of O(n) they'd be with a full copy.

[0] https://en.wikipedia.org/wiki/Persistent_data_structure#Pers...

5. thechao ◴[11 Dec 25 15:26 UTC] No.46232538[source]▶

>>46232421 #

And, yet, somehow, functional languages have been doing this since forever? I jest, I jest! There's an entire field of study focused on this! I'm no expert, but imagine you've got a singly-linked list, and we're going to allow only push_back() and pull_back() (no insert()). Let's go ahead and let multiple owners have at this list. If A does "pull_back()" what really happens is A has a local "tail" pointer that moves back along the end of the list and has a new belief about the end of the list. When A does "push_back()" it begins attaching links at its new tail. Since the links just "point backwards" without modifying the old links, this doesn't mutate the list "as is", it only mutates A's version of the list. So, if B is also looking at the list, it could just start doing "push_back()" and add its own "tail". The result is a data-structure that's a "bunch of shared singly linked lists" in a bush structure that's more akin to a tree than a singly linked list.

You can do the same thing with binary trees and other structures. It's more fiddly, and there's definitely places where single-ownership allows mutability "under the hood" for real performance gains, but that's the basic idea.

replies(1): >>46236002 #

6. perrygeo ◴[11 Dec 25 16:58 UTC] No.46233892[source]▶

>>46232356 #

You're absolutely right: an "indirection data structure" is necessary. Freezing the data is the least interesting part - it doesn't give you any of the benefits typically associated with immutable data structures in functional languages. That's my point - Python is shipping a half solution that's being mistaken for a proper one.

You think Python developers are going to roll their own HAMT on top of frozendicts? Or are they just gonna make copies? Personally, I'd just use pyrsistent which seems to get it right.

replies(3): >>46234888 #>>46234947 #>>46254141 #

7. BiteCode_dev ◴[11 Dec 25 18:13 UTC] No.46234888{3}[source]▶

>>46233892 #

pyrsistent is super slow, though. Just ran a quick benchmark:

  - Creation - 8-12x slower  
  - Lookup - 22-27x slower  
  - Contains check - 30-34x slower  
  - Iteration - 5-14x slower  
  - Merge - 32-158x slower

Except at 10k+ items, batchup dates on 100K+ items or inserting 100 keys.

This is rarely the case in practice, most dictionaries and dict operations are small, if you have a huge dict, you probably should be chunking your load or delegating that to infra.

Not to mention pyrsistent's API is incompatible with dicts, so you can't pass it to external code without conversion.

You'd better have an incredible ROI to justify that.

replies(1): >>46236421 #

8. rented_mule ◴[11 Dec 25 18:19 UTC] No.46234947{3}[source]▶

>>46233892 #

It's a little early to say that "Python is shipping" anything related to this. The PEP is still in draft status and it may be modified (as it was three hours ago) or even rejected. The article says it is likely to be accepted, but that has not happened yet. That also means there is time to comment on the PEP in the discussion it links to if you'd like.

https://peps.python.org/pep-0814/

9. cylemons ◴[11 Dec 25 19:28 UTC] No.46236002{3}[source]▶

>>46232538 #

Interesting, I didn't think about that, so it is copying on write but on a more granular level

10. instig007 ◴[11 Dec 25 20:05 UTC] No.46236421{4}[source]▶

>>46234888 #

> pyrsistent is super slow, though

Since when is Python about speed?

> Just ran a quick benchmark

Where's the code? Have you observed the bottleneck call?

> Except at 10k+ items, batchup dates on 100K+ items or inserting 100 keys.

> This is rarely the case in practice

Where's the stats on the actual practice?

> You'd better have an incredible ROI to justify that.

The ROI being: fearless API design where 1) multiple instances of high level components are truly independent and could easily parallelize, 2) calling sites know that they keep the original data intact and that callees behave within the immutability constraints, 3) default func inputs and global scope objects are immutable without having to implement another PEP, 4) collections are hashable in general.

replies(1): >>46237647 #

11. BiteCode_dev ◴[11 Dec 25 21:46 UTC] No.46237647{5}[source]▶

>>46236421 #

Clearly the ROI is perfect for you.

I won't waste more of your time.

replies(1): >>46239035 #

12. tylerhou ◴[11 Dec 25 23:21 UTC] No.46238746[source]▶

>>46232421 #

Basically, a proxy. You don't need to deep copy; you just need to return a proxy object that falls back to the original dict if the key you are requesting has not been found or modified.

Functional data structures essentially create a proxy on every write. This can be inefficient if you make writes in batches, and you only need immutability between batches.

13. instig007 ◴[11 Dec 25 23:51 UTC] No.46239035{6}[source]▶

>>46237647 #

It's perfect for most Python developers actually, not just for myself, contrary to your "in practice" claim.

14. compressedgas ◴[13 Dec 25 12:40 UTC] No.46254141{3}[source]▶

>>46233892 #

> You think Python developers are going to roll their own HAMT

Python already has an HAMT implementation in use by the contextvars module.