←back to thread

264 points rbanffy | 1 comments | | HN request time: 0.276s | source
Show context
sgarland ◴[] No.44004897[source]
> Instead, many reach for multiprocessing, but spawning processes is expensive

Agreed.

> and communicating across processes often requires making expensive copies of data

SharedMemory [0] exists. Never understood why this isn’t used more frequently. There’s even a ShareableList which does exactly what it sounds like, and is awesome.

[0]: https://docs.python.org/3/library/multiprocessing.shared_mem...

replies(7): >>44004956 #>>44005006 #>>44006103 #>>44006145 #>>44006664 #>>44006670 #>>44007267 #
chubot ◴[] No.44006145[source]
Spawning processes generally takes much less than 1 ms on Unix

Spawning a PYTHON interpreter process might take 30 ms to 300 ms before you get to main(), depending on the number of imports

It's 1 to 2 orders of magnitude difference, so it's worth being precise

This is a fallacy with say CGI. A CGI in C, Rust, or Go works perfectly well.

e.g. sqlite.org runs with a process PER REQUEST - https://news.ycombinator.com/item?id=3036124

replies(8): >>44006287 #>>44007950 #>>44008877 #>>44009754 #>>44009755 #>>44009805 #>>44010011 #>>44012318 #
morningsam ◴[] No.44009805[source]
>Spawning a PYTHON interpreter process might take 30 ms to 300 ms

Which is why, at least on Linux, Python's multiprocessing doesn't do that but fork()s the interpreter, which takes low-single-digit ms as well.

replies(2): >>44010280 #>>44011993 #
zahlman ◴[] No.44010280[source]
Even when the 'spawn' strategy is used (default on Windows, and can be chosen explicitly on Linux), the overhead can largely be avoided. (Why choose it on Linux? Apparently forking can cause problems if you also use threads.) Python imports can be deferred (`import` is a statement, not a compiler or pre-processor directive), and child processes (regardless of the creation strategy) name the main module as `__mp_main__` rather than `__main__`, allowing the programmer to distinguish. (Being able to distinguish is of course necessary here, to avoid making a fork bomb - since the top-level code runs automatically and `if __name__ == '__main__':` is normally top-level code.)

But also keep in mind that cleanup for a Python process also takes time, which is harder to trace.

Refs:

https://docs.python.org/3/library/multiprocessing.html#conte... https://stackoverflow.com/questions/72497140

replies(1): >>44010625 #
kstrauser ◴[] No.44010625[source]
I really wish Python had a way to annotate things you don't care about cleaning up. I don't know what the API would look like, but I imagine something like:

  l = list(cleanup=False)
  for i in range(1_000_000_000): l.append(i)
telling the runtime that we don't need to individually GC each of those tiny objects and just let the OS's process model free the whole thing at once.

Sure, close TCP connections before you kill the whole thing. I couldn't care less about most objects, though.

replies(4): >>44010738 #>>44010861 #>>44010911 #>>44012256 #
1. Too ◴[] No.44012256[source]
Never experienced this. If this is truly a problem, here is a sledgehammer, just beware it will not close your tcp connections gracefully: os.kill(os.getpid(), signal.SIGKILL).