Most active commenters
  • monkeyelite(3)

←back to thread

255 points rbanffy | 17 comments | | HN request time: 0.872s | source | bottom
1. pansa2 ◴[] No.44004148[source]
Does removal of the GIL have any other effects on multi-threaded Python code (other than allowing it to run in parallel)?

My understanding is that the GIL has lasted this long not because multi-threaded Python depends on it, but because removing it:

- Complicates the implementation of the interpreter

- Complicates C extensions, and

- Causes single-threaded code to run slower

Multi-threaded Python code already has to assume that it can be pre-empted on the boundary between any two bytecode instructions. Does free-threaded Python provide the same guarantees, or does it require multi-threaded Python to be written differently, e.g. to use additional locks?

replies(4): >>44004334 #>>44004386 #>>44007874 #>>44011862 #
2. rfoo ◴[] No.44004334[source]
> Does free-threaded Python provide the same guarantees

Mostly. Some of the "can be pre-empted on the boundary between any two bytecode instructions" bugs are really hard to hit without free-threading, though. And without free-threading people don't use as much threading stuff. So by nature it exposes more bugs.

Now, my rants:

> have any other effects on multi-threaded Python code

It stops people from using multi-process workarounds. Hence, it simplifies user-code. IMO totally worth it to make the interpreter more complex.

> Complicates C extensions

The alternative (sub-interpreters) complicates C extensions more than free-threading and the top one most important C extension in the entire ecosystem, numpy, stated that they can't and they don't want to support sub-interpreters. On contrary, they already support free-threading today and are actively sorting out remaining bugs.

> Causes single-threaded code to run slower

That's the trade-off. Personally I think a single digit percentage slow-down of single-threaded code worth it.

replies(2): >>44005969 #>>44006545 #
3. jacob019 ◴[] No.44004386[source]
Your understanding is correct. You can use all the cores but it's much slower per thread and existing libraries may need to be reworked. I tried it with PyTorch, it used 10x more CPU to do half the work. I expect these issues to improve, still great to see after 20 years wishing for it.
4. celeritascelery ◴[] No.44005969[source]
> That's the trade-off. Personally I think a single digit percentage slow-down of single-threaded code worth it.

Maybe. I would expect that 99% of python code going forward will still be single threaded. You just don’t need that extra complexity for most code. So I would expect that python code as a whole will have worse performance, even though a handful of applications will get faster.

replies(3): >>44006258 #>>44006881 #>>44007872 #
5. pphysch ◴[] No.44006258{3}[source]
But the bar to parallelizing code gets much lower, in theory. Your serial code got 5% slower but has a direct path to being 50% faster.

And if there's a good free-threaded HTTP server implementation, the RPS of "Python code as a whole" could increase dramatically.

replies(2): >>44006825 #>>44007196 #
6. rocqua ◴[] No.44006545[source]
Note that there is an entire order of magnitude range for a 'single digit'.

A 1% slowdown seems totally fine. A 9% slowdown is pretty bad.

replies(1): >>44011869 #
7. weakfish ◴[] No.44006825{4}[source]
Is there any news from FastAPI folks and/or Gunicorn on their support?
8. rfoo ◴[] No.44006881{3}[source]
That's the mindset that leads to the funny result that `uv pip` is like 10x faster than `pip`.

Is it because Rust is just fast? Nope. For anything after resolving dependency versions raw CPU performance doesn't matter at all. It's writing concurrent PLUS parallel code in Rust is easier, doesn't need to spawn a few processes and wait for the interpreter to start in each, doesn't need to serialize whatever shit you want to run constantly. So, someone did it!

Yet, there's a pip maintainer who actively sabotages free-threading work. Nice.

replies(1): >>44008415 #
9. fjasdfas ◴[] No.44007196{4}[source]
You can do multiple processes with SO_REUSEPORT.

free-threaded makes sense if you need shared state.

replies(1): >>44009777 #
10. foresto ◴[] No.44007872{3}[source]
As I recall, CPython has also been getting speed-ups lately, which ought to make up for the minor single-threaded performance loss introduced by free threading. With that in mind, the recent changes seem like an overall win to me.
replies(1): >>44011230 #
11. btilly ◴[] No.44007874[source]
It makes race conditions easier to hit, and that will require multi-threaded Python to be written with more care to achieve the same level of reliability.
12. notpushkin ◴[] No.44008415{4}[source]
> Yet, there's a pip maintainer who actively sabotages free-threading work.

Wow. Could you elaborate?

13. pphysch ◴[] No.44009777{5}[source]
Any webserver that wants to cache and reuse content cares about shared state, but usually has to outsource that to a shared in-memory database because the language can't support it.
replies(1): >>44011868 #
14. celeritascelery ◴[] No.44011230{4}[source]
It’s not either/or. The CPython speedups would be even better with the single threaded interpreter.
15. monkeyelite ◴[] No.44011862[source]
Yes it makes every part of the ecosystem more complex and prone to bugs in hopes of getting more performance in a scripting language.
16. monkeyelite ◴[] No.44011868{6}[source]
And most web servers already need in memory databases for other things. And it’s a great design principle - use sharp focused tools.
17. monkeyelite ◴[] No.44011869{3}[source]
If so, then why use python?