Most active commenters
  • tomrod(4)
  • frollogaston(3)
  • notpushkin(3)

←back to thread

259 points rbanffy | 17 comments | | HN request time: 0.873s | source | bottom
Show context
sgarland ◴[] No.44004897[source]
> Instead, many reach for multiprocessing, but spawning processes is expensive

Agreed.

> and communicating across processes often requires making expensive copies of data

SharedMemory [0] exists. Never understood why this isn’t used more frequently. There’s even a ShareableList which does exactly what it sounds like, and is awesome.

[0]: https://docs.python.org/3/library/multiprocessing.shared_mem...

replies(7): >>44004956 #>>44005006 #>>44006103 #>>44006145 #>>44006664 #>>44006670 #>>44007267 #
1. ogrisel ◴[] No.44005006[source]
You cannot share arbitrarily structured objects in the `ShareableList`, only atomic scalars and bytes / strings.

If you want to share structured Python objects between instances, you have to pay the cost of `pickle.dump/pickle.dump` (CPU overhead for interprocess communication) + the memory cost of replicated objects in the processes.

replies(3): >>44006004 #>>44008341 #>>44010473 #
2. tomrod ◴[] No.44006004[source]
I can fit a lot of json into bytes/strings though?
replies(4): >>44006041 #>>44006052 #>>44007146 #>>44008154 #
3. cjbgkagh ◴[] No.44006041[source]
Perhaps flatbuffers would be better?
replies(2): >>44006072 #>>44007279 #
4. vlovich123 ◴[] No.44006052[source]
That’s even worse than pickle.
replies(1): >>44006078 #
5. tomrod ◴[] No.44006072{3}[source]
I love learning from folks on HN -- thanks! Will check it out.
replies(1): >>44008294 #
6. tomrod ◴[] No.44006078{3}[source]
pickle pickles to pickle binary, yeah? So can stream that too with an io Buffer :D
7. frollogaston ◴[] No.44007146[source]
If all your state is already json-serializable, yeah. But that's just as expensive as copying if not more, hence what cjbgkagh said about flatbuffers.
replies(1): >>44010306 #
8. tinix ◴[] No.44007279{3}[source]
let me introduce you to quickle.
9. reliabilityguy ◴[] No.44008154[source]
What’s the point? The whole idea is to share an object, and not to serialize them whether it’s json, pickle, or whatever.
replies(1): >>44008726 #
10. notpushkin ◴[] No.44008294{4}[source]
Take a look at https://capnproto.org/ as well, while at it.

Neither solve the copying problem, though.

replies(2): >>44010278 #>>44010304 #
11. notpushkin ◴[] No.44008341[source]
We need a dataclass-like interface on top of a ShareableList.
replies(1): >>44010745 #
12. tomrod ◴[] No.44008726{3}[source]
I mean, the answer to this is pretty straightforward -- because we can, not because we should :)
13. ◴[] No.44010278{5}[source]
14. frollogaston ◴[] No.44010304{5}[source]
Ah, I forgot capnproto doesn't let you edit a serialized proto in-memory, it's read-only. In theory this should be possible as long as you're not changing the length of anything, but I'm not surprised such trickery is unsupported.

So this doesn't seem like a versatile solution for sharing data structs between two Python processes. You're gonna have to reserialize the whole thing if one side wants to edit, which is basically copying.

15. frollogaston ◴[] No.44010306{3}[source]
oh nvm, that doesn't solve this either
16. sgarland ◴[] No.44010473[source]
So don’t do that? Send data to workers as primitives, and have a separate process that reads the results and serializes it into whatever form you want.
17. notpushkin ◴[] No.44010745[source]
Actually, ShareableList feels like a tuple really (as it’s impossible to change its length). If we could mix ShareableList and collections.namedtuple together, it would get us 90% there (99.9% if we use typing.NamedTuple). Unfortunately, I can’t decipher either one [1, 2] from the first glance – maybe if I get some more sleep?

[1]: https://github.com/python/cpython/blob/3.13/Lib/collections/...

[2]: https://github.com/python/cpython/blob/3.13/Lib/typing.py#L2...