The first year of free-threaded Python

1. pansa2 ◴[16 May 25 11:38 UTC] No.44004148[source]▶

Does removal of the GIL have any other effects on multi-threaded Python code (other than allowing it to run in parallel)?

My understanding is that the GIL has lasted this long not because multi-threaded Python depends on it, but because removing it:

- Complicates the implementation of the interpreter

- Complicates C extensions, and

- Causes single-threaded code to run slower

Multi-threaded Python code already has to assume that it can be pre-empted on the boundary between any two bytecode instructions. Does free-threaded Python provide the same guarantees, or does it require multi-threaded Python to be written differently, e.g. to use additional locks?

replies(4): >>44004334 #>>44004386 #>>44007874 #>>44011862 #

2. rfoo ◴[16 May 25 11:58 UTC] No.44004334[source]▶

>>44004148 (TP) #

> Does free-threaded Python provide the same guarantees

Mostly. Some of the "can be pre-empted on the boundary between any two bytecode instructions" bugs are really hard to hit without free-threading, though. And without free-threading people don't use as much threading stuff. So by nature it exposes more bugs.

Now, my rants:

> have any other effects on multi-threaded Python code

It stops people from using multi-process workarounds. Hence, it simplifies user-code. IMO totally worth it to make the interpreter more complex.

> Complicates C extensions

The alternative (sub-interpreters) complicates C extensions more than free-threading and the top one most important C extension in the entire ecosystem, numpy, stated that they can't and they don't want to support sub-interpreters. On contrary, they already support free-threading today and are actively sorting out remaining bugs.

> Causes single-threaded code to run slower

That's the trade-off. Personally I think a single digit percentage slow-down of single-threaded code worth it.

replies(2): >>44005969 #>>44006545 #

3. jacob019 ◴[16 May 25 12:02 UTC] No.44004386[source]▶

>>44004148 (TP) #

Your understanding is correct. You can use all the cores but it's much slower per thread and existing libraries may need to be reworked. I tried it with PyTorch, it used 10x more CPU to do half the work. I expect these issues to improve, still great to see after 20 years wishing for it.

4. celeritascelery ◴[16 May 25 14:30 UTC] No.44005969[source]▶

>>44004334 #

> That's the trade-off. Personally I think a single digit percentage slow-down of single-threaded code worth it.

Maybe. I would expect that 99% of python code going forward will still be single threaded. You just don’t need that extra complexity for most code. So I would expect that python code as a whole will have worse performance, even though a handful of applications will get faster.

replies(4): >>44006258 #>>44006881 #>>44007872 #>>44016114 #

5. pphysch ◴[16 May 25 14:56 UTC] No.44006258{3}[source]▶

>>44005969 #

But the bar to parallelizing code gets much lower, in theory. Your serial code got 5% slower but has a direct path to being 50% faster.

And if there's a good free-threaded HTTP server implementation, the RPS of "Python code as a whole" could increase dramatically.

replies(2): >>44006825 #>>44007196 #

6. rocqua ◴[16 May 25 15:20 UTC] No.44006545[source]▶

>>44004334 #

Note that there is an entire order of magnitude range for a 'single digit'.

A 1% slowdown seems totally fine. A 9% slowdown is pretty bad.

replies(2): >>44011869 #>>44015204 #

7. weakfish ◴[16 May 25 15:45 UTC] No.44006825{4}[source]▶

>>44006258 #

Is there any news from FastAPI folks and/or Gunicorn on their support?

8. rfoo ◴[16 May 25 15:50 UTC] No.44006881{3}[source]▶

>>44005969 #

That's the mindset that leads to the funny result that `uv pip` is like 10x faster than `pip`.

Is it because Rust is just fast? Nope. For anything after resolving dependency versions raw CPU performance doesn't matter at all. It's writing concurrent PLUS parallel code in Rust is easier, doesn't need to spawn a few processes and wait for the interpreter to start in each, doesn't need to serialize whatever shit you want to run constantly. So, someone did it!

Yet, there's a pip maintainer who actively sabotages free-threading work. Nice.

replies(2): >>44008415 #>>44014609 #

9. fjasdfas ◴[16 May 25 16:16 UTC] No.44007196{4}[source]▶

>>44006258 #

You can do multiple processes with SO_REUSEPORT.

free-threaded makes sense if you need shared state.

replies(1): >>44009777 #

10. foresto ◴[16 May 25 17:22 UTC] No.44007872{3}[source]▶

>>44005969 #

As I recall, CPython has also been getting speed-ups lately, which ought to make up for the minor single-threaded performance loss introduced by free threading. With that in mind, the recent changes seem like an overall win to me.

replies(1): >>44011230 #

11. btilly ◴[16 May 25 17:22 UTC] No.44007874[source]▶

>>44004148 (TP) #

It makes race conditions easier to hit, and that will require multi-threaded Python to be written with more care to achieve the same level of reliability.

12. notpushkin ◴[16 May 25 18:23 UTC] No.44008415{4}[source]▶

>>44006881 #

> Yet, there's a pip maintainer who actively sabotages free-threading work.

Wow. Could you elaborate?

13. pphysch ◴[16 May 25 20:58 UTC] No.44009777{5}[source]▶

>>44007196 #

Any webserver that wants to cache and reuse content cares about shared state, but usually has to outsource that to a shared in-memory database because the language can't support it.

replies(1): >>44011868 #

14. celeritascelery ◴[17 May 25 01:06 UTC] No.44011230{4}[source]▶

>>44007872 #

It’s not either/or. The CPython speedups would be even better with the single threaded interpreter.

replies(1): >>44013013 #

15. monkeyelite ◴[17 May 25 03:38 UTC] No.44011862[source]▶

>>44004148 (TP) #

Yes it makes every part of the ecosystem more complex and prone to bugs in hopes of getting more performance in a scripting language.

16. monkeyelite ◴[17 May 25 03:40 UTC] No.44011868{6}[source]▶

>>44009777 #

And most web servers already need in memory databases for other things. And it’s a great design principle - use sharp focused tools.

17. monkeyelite ◴[17 May 25 03:40 UTC] No.44011869{3}[source]▶

>>44006545 #

If so, then why use python?

18. foresto ◴[17 May 25 08:58 UTC] No.44013013{5}[source]▶

>>44011230 #

Nobody has suggested otherwise.

19. Doxin ◴[17 May 25 14:32 UTC] No.44014609{4}[source]▶

>>44006881 #

For anyone who hasn't used uv, I feel like 10x faster is an understatement. For cases where packages are already downloaded it's basically instant for any use case I have run into.

20. smilliken ◴[17 May 25 15:59 UTC] No.44015204{3}[source]▶

>>44006545 #

I've seen benchmarks that estimate the regression at 20-30%, though I expect there's large variance depending on what a program's bottleneck is.

21. oconnor663 ◴[17 May 25 18:35 UTC] No.44016114{3}[source]▶

>>44005969 #

Sure but of those 99%, how many are performance-sensitive, CPU-bound (in Python not in C) applications? It's clearly some, not saying it's an easy tradeoff, but I assume the large majority of Python programs out there won't notice a slowdown.