OFC it would be nice to just write python and everything would be 12x accelerated, but i don't see how there would not be any draw-backs that would interfere with what makes python so approachable.
With the critical mass Python acquired over the years, GIL becomes a very sore bottleneck in some cases. This is why I decided to learn Go, for example. Properly threaded (and green threaded) programming language which is higher level than C/C++, but lower than Python which allows me to do things which I can't do with Python. Compilation is another reason, but it was secondary with respect to threading.
https://www.youtube.com/watch?v=_9B__0S21y8 is fairly concise and gives some recommendations for literature and techniques, obviously making an effort in promoting PlusCal/TLA+ along the way but showcases how even apparently simple algorithms can be problematic as well as how deep analysis has to go to get you a guarantee that the execution will be bug free.
My understanding is that the GIL has lasted this long not because multi-threaded Python depends on it, but because removing it:
- Complicates the implementation of the interpreter
- Complicates C extensions, and
- Causes single-threaded code to run slower
Multi-threaded Python code already has to assume that it can be pre-empted on the boundary between any two bytecode instructions. Does free-threaded Python provide the same guarantees, or does it require multi-threaded Python to be written differently, e.g. to use additional locks?
Of course, while the transcription is in action the rest of the UI (Qt via Pyside) should remain usable. And multiple transcription requests should be supported - I'm thinking of a pool of transcription threads, but I'm uncertain how many to allocate. Half the quantity of CPUs? All the CPUs under 50% load?
Advise welcome!
What changes for you? Nothing unless you start using threads. You probably weren't using threads anyway because there is little to no point in python to using them. Most python code bases completely ignore the threading module and instead use non blocking IO, async, or similar things. The GIL thing only kicks in if you actually use threads.
If you don't use threads, removing the GIL changes nothing. There's no code that will break. All those C libraries that aren't thread safe are still single threaded, etc. Only if you now start using threads do you need to pay attention.
There's some threaded python code of course that people may have written in python somewhat naively in the hope that it would make things faster that is constantly hitting the GIL and is effectively single threaded. That code now might run a little faster. And probably with more bugs because naive threaded code tends to have those.
But a simple solution to address your fears: simply don't use threads. You'll be fine.
Or learn how to use threads. Because now you finally can and it isn't that hard if you have the right abstractions. I'm sure those will follow in future releases. Structured concurrency is probably high on the agenda of some people in the community.
Mostly. Some of the "can be pre-empted on the boundary between any two bytecode instructions" bugs are really hard to hit without free-threading, though. And without free-threading people don't use as much threading stuff. So by nature it exposes more bugs.
Now, my rants:
> have any other effects on multi-threaded Python code
It stops people from using multi-process workarounds. Hence, it simplifies user-code. IMO totally worth it to make the interpreter more complex.
> Complicates C extensions
The alternative (sub-interpreters) complicates C extensions more than free-threading and the top one most important C extension in the entire ecosystem, numpy, stated that they can't and they don't want to support sub-interpreters. On contrary, they already support free-threading today and are actively sorting out remaining bugs.
> Causes single-threaded code to run slower
That's the trade-off. Personally I think a single digit percentage slow-down of single-threaded code worth it.
How are 'concurrent.futures' users impacted? What will I need to change moving forward?
Things they won't tell you at PyCon.
Im not worried about new code. Im worried about stuff written 15 years ago by a monkey who had no idea how threads work and just read something on stack overflow that said to use threading. This code will likely break when run post-GIL. I suspect there is actually quite a bit of it.
https://www.linkedin.com/posts/mdboom_its-been-a-tough-coupl...
Lets see whatever performance improvements still land on CPython, unless other company sponsors the work.
I guess Facebook (no need to correct me on the name) is still sponsoring part of it.
Most C extensions that will break are not written by monkeys, but by conscientious developers that followed best practices.
Older code will break, but they break all the time. A language changes how something behaves in a new revision, suddenly 20 year old bedrock tools are getting massively patched to accommodate both new and old behavior.
Is it painful, ugly, unpleasant? Yes, yes and yes. However change is inevitable, because some of the behavior was rooted in inability to do some things with current technology, and as hurdles are cleared, we change how things work.
My father's friend told me that length of a variable's name used to affect compile/link times. Now we can test whether we have memory leaks in Rust. That thing was impossible 15 years ago due to performance of the processors.
def f(x):
for _ in range(N):
l.append(x)
I've tried it out and they start interleaving when N is set to 1000000.Additionally, at this stage the severe political and governance problems cannot have escaped Microsoft. I imagine that no competent Microsoft employee wants to give his expertise to CPython, only later to suffer group defamation from a couple of elected mediocre people.
CPython is an organization that overpromises, allocates jobs to the obedient and faithful while weeding out competent dissenters.
It wasn't always like that. The issues are entirely self-inflicted.
Agreed.
> and communicating across processes often requires making expensive copies of data
SharedMemory [0] exists. Never understood why this isn’t used more frequently. There’s even a ShareableList which does exactly what it sounds like, and is awesome.
[0]: https://docs.python.org/3/library/multiprocessing.shared_mem...
I feel some trepidation about threads, but at least for debugging purposes there's only one process to attach to.
Use SharedMemory to pass the data back and forth.
If you want to share structured Python objects between instances, you have to pay the cost of `pickle.dump/pickle.dump` (CPU overhead for interprocess communication) + the memory cost of replicated objects in the processes.
The only code that is going to break because of "No GIL" are C extensions and for very obvious reasons: You can now call into C code from multiple threads, which wasn't possible before, but is now. Python code could always be called from multiple python threads even in the presence of the GIL in python.
In a language conceived for this kind of work it's not as easy as you'd like. In most languages you're going to write nonsense which has no coherent meaning whatsoever. Experiments show that humans can't successfully understand non-trivial programs unless they exhibit Sequential Consistency - that is, they can be understood as if (which is not reality) all the things which happen do happen in some particular order. This is not the reality of how the machine works, for subtle reasons, but without it merely human programmers are like "Eh, no idea, I guess everything is computer?". It's really easy to write concurrent programs which do not satisfy this requirement in most of these languages, you just can't debug them or reason about what they do - a disaster.
As I understand it Python without the GIL will enable more programs that lose SC.
A few decades ago MS did indeed have a playbook which they used to undermine open standards. Laying off some members of the Python team bears no resemblence whatsoever to that. At worst it will delay the improvement of free-threaded Python. That's all.
Your comment is lazy and unfounded.
Well, you sure managed to avoid that by setting up camp on that hill. Kudos on so much time saved.
> For me Facebook will always be Facebook, and Twitter will always be Twitter.
Well, for me the product will always be "Thefacebook", but that's since I haven't used it since. But I do respect that there's a company running it now that does more stuff and contributes to open source projects.
Decent threading is awesome news, but it only affects a small minority of use cases. Threads are only strictly necessary when it's prohibitive to message pass. The Python ecosystem these days includes a playbook solution for literally any such case. Considering the multiple major pitfalls of threads (i.e., locking), they are likely to become a thing useful only in specific libraries/domains and not as a general.
Additionally, with all my love to vanilla Python, anyone who needs to squeeze the juice out of their CPU (which is actually memory bandwidth) has a plenty of other tools -- off the shelf libraries written in native code. (Honorable mention to Pypy, numba and such).
Finally, the one dramatic performance innovation in Python has been async programming - I warmly encourage everyone not familiar with it to consider taking a look.
No it does not. I hate that analogy so much because it leads to such bad behavior. Software is a digital artifact that can does not degrade. With the right attitude, you'd be able to execute the same binary on new machines for as long as you desired. That is not true of organic matter that actually rots.
The only reason we need to change software is that we trade that off against something else. Instructions are reworked, because chasing the universal Turing machine takes a few sacrifices. If all software has to run on the same hardware, those two artifacts have to have a dialogue about what they need from each other.
If we didnt want the universal machine to do anything new. If we had a valuable product. We could just keep making the machine that executes that product. It never rots.
* VSCode got popular and they started preventing forks from installing its extensions.
* They extended the Free Source pyright language server into the proprietary pylance. They don’t even sell it. It’s just there to make the FOSS version less useful.
* They bought GitHub and started rate limiting it to unlogged in visitors.
Every time Microsoft touches a thing, they end up locking it down. They can’t help it. It’s their nature. And if you’re the frog carrying that scorpion across the pond and it stings you, well, you can only blame it so much. You knew this when they offered the deal.
Every time. It hasn’t changed substantially since they declared that Linux is cancer, except to be more subtle in their attacks.
if len(my_list) > 5:
print(my_list[5])
(i.e. because a different thread can pop from the list in-between the check and the print), that could just as easily happen today. The GIL makes sure that only one python interpreter runs at once, but it's entirely possible that the GIL is released and switches to a different thread after the check but before the print, so there's no extra thread-safety issue in free-threaded mode.The problems (as I understand it, happy to be corrected), are mostly two-fold: performance and ecosystem. Using fine-grained locking is potentially much less efficient than using the GIL in the single-threaded case (you have to take and release many more locks, and reference count updates have to be atomic), and many, many C extensions are written under the assumption that the GIL exists.
But if you tried to compile it on today’s libc, making today’s syscalls… good luck with that.
Software “rots” in the sense that it has to be updated to run on today’s systems. They’re a moving target. You can still run HyperCard on an emulator, but good luck running it unmodded on a Mac you buy today.
If software is implicitly built on wrong understanding, or undefined behaviour, I consider it rotting when it starts to fall apart as those undefined behaviours get defined. We do not need to sacrifice a stable future because of a few 15 year old programs. Let the people who care about the value that those programs bring, manage the update cycle and fix it.
Coming from the Java world, you don't know what you're missing. Looking inside an application and seeing a bunch of threadpools managed by competing frameworks, debugging timeouts and discovering that tasks are waiting more than a second to get scheduled on the wrong threadpool, tearing your hair out because someone split a tiny sub-10μs bit of computation into two tasks and scheduling the second takes a hundred times longer than the actual work done, adding a library for a trivial bit of functionality and discovering that it spins up yet another threadpool when you initialize it.
(I'm mostly being tongue in cheek here because I know it's nice to have threading when you need it.)
Obviously I know that companies aren't people and don't have feelings, but I can't understand why you would intentionally avoid using their chosen name, even when it's more effort to you.
A fairly common pattern for me is to start a terminal UI updating thread that redraws the UI every second or so while one or more background threads do their thing. Sometimes, it’s easier to express something with threads and we do it not to make the process faster (we kind of accept it will be a bit slower).
The real enemy is state that can me mutated from more than one place. As long as you know who can change what, threads are not that scary.
Even more fun: allocating memory could trigger Python's garbage collector which would also run `__del_-` functions. So every allocation was also a possible (but rare) thread switch.
The GIL was only ever intended to protect Python's internal state (esp. the reference counts themselves); any extension modules assuming that their own state would also be protected were likely already mistaken.
Maybe. I would expect that 99% of python code going forward will still be single threaded. You just don’t need that extra complexity for most code. So I would expect that python code as a whole will have worse performance, even though a handful of applications will get faster.
I think if someone set out to write a new dynamic scripting language today, from scratch, that multithreading it would not pose any particular challenge. Beyond that fact that it's naturally a difficult problem, I mean, but nothing special compared to the many other languages that have implemented threading. It's all about all that code from before the threading era that's the problem, not the threading itself. And Python has a loooot of that code.
There's a part of me that wants to scream at them:
"Look around you!!! It's not 1999 anymore!!! These days we have Google, Amazon, Apple, Facebook, etc, which are just as bad if not worse!!! Cut it out with the 20+ year old bad jokes!!!"
Yes, Microsoft is bad. The reason Micr$oft was the enemy back in the day is because they... won. They were bigger than anyone else in the fields that mattered (except for server-side, where they almost one). Now they're just 1 in a gang of evils. There's nothing special about them anymore. I'm more scared of Apple and Google.
I wonder why people never complained so much about JavaScript not having shared-everything threading. Maybe because JavaScript is so much faster that you don't have to reach for it as much. I wish more effort was put into baseline performance for Python.
Spawning a PYTHON interpreter process might take 30 ms to 300 ms before you get to main(), depending on the number of imports
It's 1 to 2 orders of magnitude difference, so it's worth being precise
This is a fallacy with say CGI. A CGI in C, Rust, or Go works perfectly well.
e.g. sqlite.org runs with a process PER REQUEST - https://news.ycombinator.com/item?id=3036124
"Python programmers are so incompetent that Python succeeds as a language only because it lacks features they wouldn't know to use"
Even if it's circumstantially true, doesn't mean it's the right guiding principle for the design of the language.
And if there's a good free-threaded HTTP server implementation, the RPS of "Python code as a whole" could increase dramatically.
I'm thankful that it does, or I would have been out of work long ago. It's not that the files change (literal rot), it is that hardware, OSes, libraries, and everything else changes. I'm also thankful that we have not stopped innovating on all of the things the software I write depends on. You know, another thing changes - what we are using the software for. The accounting software I wrote in the late 80s... would produce financial reports that were what was expected then, but would not meet modern GAAP requirements.
But the thing is that Microsoft hasn’t seemed to fundamentally change since 1999. They appear kinder and friendlier but they keep running the same EEE playbook everywhere they can. Lots of us give them a free pass because they let us run a nifty free-for-now programming editor. That doesn’t change the leopard’s spots, though.
> A global interpreter lock (GIL) is used internally to ensure that only one thread runs in the Python VM at a time. In general, Python offers to switch among threads only between bytecode instructions; how frequently it switches can be set via sys.setswitchinterval(). Each bytecode instruction and therefore all the C implementation code reached from each instruction is therefore atomic from the point of view of a Python program.
https://docs.python.org/3/faq/library.html#what-kinds-of-glo...
If this is not the case please let the official python team know their documentation is wrong. It indeed does state that if Py_DECREF is invoked the bets are off. But a ton of operations never do that.
A classic example is ActiveX.
Python has a lot of solid workarounds for avoid threading because until now Python threading has absolutely sucked. I had naively tried to use it to make a CPU-bound workload twice as fast and soon realized the implications of the GIL, so I threw all that code away and made it multiprocessing instead. That sucked in its own way because I had to serialize lots of large data structures to pass around, so 2x the cores got me about 1.5x the speed and a warmer server room.
I would love to have good threading support in Python. It’s not always the right solution, but there are a lot of circumstances where it’d be absolutely peachy, and today we’re faking our way around its absence with whole playbooks of alternative approaches to avoid the elephant in the room.
But yes, use async when it makes sense. It’s a thing of beauty. (Yes, Glyph, we hear the “I told you so!” You were right.)
Sabotaging forks is scummy, but the forks were extending MS functionality, not the other way around.
GitHub was a private company before it was bought by MS. Rate limiting is.... not great, but certainly not an extinguish play.
EEE refers to the subversion of open standards or independent free software projects. It does not apply to any of the above.
MS are still scummy but at least attack them on their own demerits, and don't parrot some schtick from decades ago.
Nah, even that was based on earlier MS technologies - OLE and COM
A good starter list of EEE plays is on the wikipedia page: https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguis...
MS has continued to metastasize and is in some ways worse than the old days, even if they’ve finally accepted the utility of open source as a loss leader.
They have the only BigTech products I’ve been forced to use if I want to eat.
Software doesn't rot, it remains constant. But the context around it changes, which means it loses usefulness slowly as time passes.
What is the name for this? You could say 'software becomes anachronistic'. But is there a good verb for that? It certainly seems like something that a lot more than just software experiences. Plenty of real world things that have been perfectly preserved are now much less useful because the context changed. Consider an Oxen-yoke, typewriters, horse-drawn carriages, envelopes, phone switchboards, etc.
It really feels like this concept should have a verb.
In contrast, no one thinks about what happens if a thread dies independently because the failure mode is joint.
So even without EEE, I think it’s supremely risky to hitch your wagon to their tech or services (unless you’re writing primarily for Windows, which is what they’d love to help you migrate to). And I can’t be convinced the GitHub acquisition wasn’t some combination of these dark patterns.
Step 1: Get a plurality of the world’s FOSS into one place.
Step 2: Feed it into a LLM and then embed it in a popular free editor so that everyone can use GPL code without actually having to abide the license.
Step 3: Make it increasingly hard to use for FOSS development by starting to add barriers a little at a time. <= we are here
As a developer, they’ve done nothing substantial to earn my trust. I think a lot of Microsoft employees are good people who don’t subscribe to all this and who want to do the right thing, but corporate culture just won’t let that be.
When you look from the program's perspective, the context changes and becomes unrecognizable, IOW, it rots.
When you look from the context's perspective, the program changes by not evolving and keeping up with the context, IOW, it rots.
Maybe we anthropomorphize both and say "they grow apart". :)
OK, finally, yes, this is very true, for specific parts of their tech.
But banging on about EEE just distracts from this, more important message.
> Make it increasingly hard to use for FOSS development by starting to add barriers a little at a time. <= we are here
....and now you've lost me again
Multithreaded code is incredibly hard to reason about. And reasoning about it becomes a lot easier if you have certain guarantees (e.g. this argument / return value always has this type, so I can always do this to it). Code written in dynamic languages will more often lack such guarantees, because of the complicated signatures. This makes it even harder to reason about Multithreaded code, increasing the risk posed by multithreaded code.
Is it because Rust is just fast? Nope. For anything after resolving dependency versions raw CPU performance doesn't matter at all. It's writing concurrent PLUS parallel code in Rust is easier, doesn't need to spawn a few processes and wait for the interpreter to start in each, doesn't need to serialize whatever shit you want to run constantly. So, someone did it!
Yet, there's a pip maintainer who actively sabotages free-threading work. Nice.
Hanlon’s razor is a thing, and I generally follow it. It’s just that I’ve seen Microsoft make so many “oops, our bad!” mistakes over the years that purely coincidentally gave them an edge up over their competition, that I tend to distrust such claims from them.
I don’t feel that way about all corps. Oracle doesn’t make little mistakes that accidentally harm the competition while helping themselves. No, they’ll look you in the eye and explain that they’re mugging you while they take your wallet. It’s kind of refreshingly honest in its own way.
I’m not anti-MS as much as anti their behavior, whoever is acting that way. This thread is directly related to MS so I’m expressing my opinion on MS here. I’ll be more than happy to share my thoughts on Chrome in a Google thread.
> Examples by Microsoft
> Browser incompatibilities
> The plaintiffs in an antitrust case claimed Microsoft had added support for ActiveX controls in the Internet Explorer Web browser to break compatibility with Netscape Navigator, which used components based on Java and Netscape's own plugin system.
Fibers
Green threads
Coroutines
Actors
Queues (eg GCD)
…
Basically you need to reason about what your thing will do.Separate concerns. Each thing is a server (microservice?) with its own backpressure.
They schedule jobs on a queue.
The jobs come with some context, I don’t care if it’s a closure on the heap or a fiber with a stack or whatever. Javascript being single threaded with promises wastefully unwinds the entire stack for each tick instead of saving context. With callbacks you can save context in closures. But even that is pretty fast.
Anyway then you can just load-balance the context across machines. Easiest approach is just to have server affinity for each job. The servers just contain a cache of the data so if the servers fail then their replacements can grab the job from an indexed database. The insertion and the lookup is O(log n) each. And jobs are deleted when done (maybe leaving behind a small log that is compacted) so there are no memory leaks.
Oh yeah and whatever you store durably should be sharded and indexed properly, so practicalkt unlimited amounts can be stored. Availability in a given share is a function of replicating the data, and the economics of it is that the client should pay with credits for every time they access. You can even replicate on demand (like bittorrent re-seeding) to handle spikes.
This is the general framework whether you use Erlang, Go, Python or PHP or whatever. It scales within a company and even across companies (as long as you sign/encrypt payloads cryptographically).
It doesn’t matter so much whether you use php-fpm with threads, or swoole, or the new kid on the block, FrankenPHP. Well, I should say I prefer the shared-nothing architecture of PHP and APC. But in Python, it is the same thing with eg Twisted vs just some SAPI.
You’re welcome.
if you're running in something like AWS fargate, there is no shared memory. have to use the network and file system which adds a lot of latency, way more than spawning a process.
copying processes through fork is a whole different problem.
green threads and an actor model will get you much further in my experience.
It's a big project that's going to take lots of time by lots of people to finish. Keep it behind opt-in, keep accepting pull requests after rigorous testing, and it's fine.
I don't fully understand the challenge with removing it, but thought it was something about C extensions, not something most users have to directly worry about.
In Rust if a thread holding a mutex dies the mutex becomes poisoned, and trying to acquire it leads to an error that has to be handled. As a consequence every rust developer that touches a mutex has to think about that failure mode. Even if in 95% of cases the best answer is "let's exit when that happens".
The operating system tends to treat your whole process as one and shot down everything or nothing. But a thread can still crash in its own due to unhandled oom, assertion failures or any number of other issues
This is a fair observation.
I think a part of the problem is that the things that make GIL less python hard are also the things that make faster baseline performance hard. I.e. an over reliance of the ecosystem on the shape of the CPython data structures.
What makes python different is that a large percentage of python code isn't python, but C code targeting the CPython api. This isn't true for a lot of other interpreted languages.
Python the language is pretty bad. Python the ecosystem of libraries and tools has no equal, unfortunately.
Switching a language is easy. Switching a billion lines of library less so.
And the tragic part is that many of the top “python libraries” are just Python interfaces to a C library! But if you want to switch to a “better language” that fact isn’t helpful.
Nobody sane tries to do math in JS. Backend JS is recommended for situations where processing is minimal and it is mostly lots of tiny IO requests that need to be shunted around.
I'm a huge JS/Node proponent and if someone says they need to write a backend service that crunches a lot of numbers, I'll recommend choosing a different technology!
For some reason Python peeps keep trying to do actual computations in Python...
But changing the language in a brownfield project is hard. I love Go, and these days I don’t bother with Python if I know the backend needs to scale.
But Python’s ecosystem is huge, and for data work, there’s little alternative to it.
With all that said, JavaScript ain’t got shit on any language. The only good thing about it is Google’s runtime, and that has nothing to do with the language. JS doesn’t have true concurrency and is a mess of a language in general. Python is slow, riddled with concurrency problems, but at least it’s a real language created by a guy who knew what he was doing.
That's not really true on POSIX. Unless you're doing nutty things with clone(), or you actually have explicit code that calls pthread_exit() or gettid()/pthread_kill(), the whole process is always going to die at the same time.
POSIX signal dispositions are process-wide, the only way e.g. SIGSEGV kills a single thread is if you write an explicit handler which actually does that by hand. Unhandled exceptions usually SIGABRT, which works the same way.
** Just to expand a bit: there is a subtlety in that, while dispositions are process-wide, one individual thread does indeed take the signal. If the signal is handled, only that thread sees -EINTR from a blocking syscall; but if the signal is not handled, the default disposition affects all threads in the process simultaneously no matter which thread is actually signalled.
I don't know how that's done in Pyside, though. I couldn't find a clear example. You might have to use a QThread instead to handle it.
Fucking hell bud :D
I was with OP's point but then you lost me. You'll always have to deal with that coworker's shitty code, GIL or not.
Could they make a worse mess with multi threading? Sure. Is their single threaded code as bad anyway because at the end of the day, you can't even begin understand it? Absolutely.
But yeah I think python people don't know what they're asking for. They think GIL less python is gonna give everyone free puppies.
Neither solve the copying problem, though.
Wow. Could you elaborate?
https://news.ycombinator.com/item?id=44009754 gives some concrete details on fork() speed on current Linux: 50μs for a small process, 700μs for a regular process, 1300μs for a venti Python interpreter process, 30000–50000μs for Python interpreter creation. This is on a CPU of about 10 billion instructions per second per core, so forking costs on the order of ½–10 million instructions.
httpdito http://canonical.org/~kragen/sw/dev3/server.s (which launches a process per request) seems to take only about 50μs because it's not linked with any C library and therefore only maps 5 pages. Also, that doesn't include the time for exit() because it runs multiple concurrent child processes.
On this laptop, a Ryzen 5 3500U running at 2.9GHz, forkovh takes about 330μs built with glibc and about 130–140μs built with dietlibc, and `time python3 -c True` takes about 30000–50000μs. I wrote a Python version of forkovh http://canonical.org/~kragen/sw/dev3/forkovh.py and it takes about 1200μs to fork(), _exit(), and wait().
If anyone else wants to clone that repo and test their own machines, I'm interested to hear the results, especially if they aren't in Linux. `make forkovh` will compile the C version.
1200μs is pretty expensive in some contexts but not others. Certainly it's cheaper than spawning a new Python interpreter by more than an order of magnitude.
That's lucky. On constrained systems launching a new interpreter can very well take 10 seconds. Python is ssssslllloooowwwww.
Which is why, at least on Linux, Python's multiprocessing doesn't do that but fork()s the interpreter, which takes low-single-digit ms as well.
I'm launching between 15 and 3000 processes per request. While Plan 9 is about 10x faster at spawning processes than Linux, it's telling that 3000 C processes launching in a shell is about as fast as one python interpreter.
Not all use cases of Python and Windows intersect (how much web server stuff is a Windows / IIS / SQL Server / Python stack? Probably not many, although WISP is a nice acronym), but you’ve still got to bear it in mind for people doing heavy numpy stuff on their work laptop or whatever.
But also keep in mind that cleanup for a Python process also takes time, which is harder to trace.
Refs:
https://docs.python.org/3/library/multiprocessing.html#conte... https://stackoverflow.com/questions/72497140
So this doesn't seem like a versatile solution for sharing data structs between two Python processes. You're gonna have to reserialize the whole thing if one side wants to edit, which is basically copying.
There has been. That's why the bytecode is incompatible between minor versions. It was a major selling(?) point for 3.11 and 3.12 in particular.
But the "Faster CPython" team at Microsoft was apparently just laid off (https://www.linkedin.com/posts/mdboom_its-been-a-tough-coupl...), and all of the optimization work has to my understanding been based around fairly traditional techniques. The C part of the codebase has decades of legacy to it, after all.
Alternative implementations like PyPy often post impressive results, and are worth checking out if you need to worry about native Python performance. Not to mention the benefits of shifting the work onto compiled code like NumPy, as you already do.
Mainly cause Python is often used for data pipelines in ways that JS isn't, causing situations where you do want to use multiple CPU cores with some shared memory. If you want to use multiple CPU cores in NodeJS, usually it's just a load-balancing webserver without IPC and you just use throng, or maybe you've got microservices.
Also, JS parallelism simply excelled from the start at waiting on tons of IO, there was no confusion about it. Python later got asyncio for this, and by now regular threads have too much momentum. Threads are the worst of both worlds in Py, cause you get the overhead of an OS thread and the possibility of race conditions without the full parallelism it's supposed to buy you. And all this stuff is confusing to users.
Fargate isn't just ECS and plain containers.
You cannot use shared memory in fargate, there is literally no /dev/shm.
See "sharedMemorySize" here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/...
> If you're using tasks that use the Fargate launch type, the sharedMemorySize parameter isn't supported.
Please don't - it isn't relevant.
15 years ago, new Python code was still dominantly for 2.x. Even code written back then with an eye towards 3.x compatibility (or, more realistically, lazily run through `2to3` or `six`) will have quite little chance of running acceptably on 3.14 regardless. There have been considerable removals from the standard library, `async` is no longer a valid identifier name (you laugh, but that broke Tensorflow once). The attitude taken towards """strings""" in a lot of 2.x code results in constructs that can be automatically made into valid syntax that appears to preserve the original intent, but which are not at all automatically fixed.
Also, the modern expectation is of a lock-step release cadence. CPython only supports up to the last 5 versions, released annually; and whenever anyone publishes a new version of a package, generally they'll see no point in supporting unsupported Python versions. Nor is anyone who released a package in the 3.8 era going to patch it if it breaks in 3.14 - because support for 3.14 was never advertised anyway. In fact, in most cases, support for 3.9 wasn't originally advertised, and you can't update the metadata for an existing package upload (you have to make a new one, even if it's just a "post-release") even if you test it and it does work.
Practically speaking, pure-Python packages usually do work in the next version, and in the next several versions, perhaps beyond the support window. But you can really never predict what's going to break. You can only offer a new version when you find out that it's going to break - and a lot of developers are going to just roll that fix into the feature development they were doing anyway, because life's too short to backport everything for everyone. (If there's no longer active development and only maintenance, well, good luck to everyone involved.)
If 5 years isn't long enough for your purposes, practically speaking you need to maintain an environment with an outdated interpreter, and find a third party (RedHat seems to be a popular choice here) to maintain it.
No, that's exactly the point I'm making, copying PTEs is not cheap on a large address space, woth many VMAs.
You can run a simple python script allocating a large list and see how it affects fork time.
ᐅ time echo "print('hi'); exit()" | python
hi
________________________________________________________
Executed in 21.48 millis fish external
usr time 16.35 millis 146.00 micros 16.20 millis
sys time 4.49 millis 593.00 micros 3.89 millis
In my estimation, the only "20 year old bedrock tools" in Python are in the standard library - which currently holds itself free to deprecate entire modules in any minor version, and remove them two minor versions later - note that this is a pseudo-calver created by a coincidentally annual release cadence. (A bunch of stuff that old was taken out recently, but it can't really be considered "bedrock" - see https://peps.python.org/pep-0594/).
Unless you include NumPy's predecessors when dating it (https://en.wikipedia.org/wiki/NumPy#History). And the latest versions of NumPy don't even support Python 3.9 which is still not EOL.
Requests turns 15 next February (https://pypi.org/project/requests/#history).
Pip isn't 20 years old yet (https://pypi.org/project/pip/#history) even counting the version 0.1 "pyinstall" prototype (not shown).
Setuptools (which generally supports only the Python versions supported by CPython, hasn't supported Python 2.x since version 45 and is currently on version 80) only appears to go back to 2006, although I can't find release dates for versions before what's on PyPI (their own changelog goes back to 0.3a1, but without dates).
l = list(cleanup=False)
for i in range(1_000_000_000): l.append(i)
telling the runtime that we don't need to individually GC each of those tiny objects and just let the OS's process model free the whole thing at once.Sure, close TCP connections before you kill the whole thing. I couldn't care less about most objects, though.
It seems the way to do it in Qt is with signals and slots, emitting a signal from your QThread and binding it to a slot in the UI thread, making sure to specify a "queued connection" [1]. There's also a lower-level postEvent method [2] but people disagree [3] on whether that's OK to call from a regular Python thread or has to be called from a QThread.
So I would try doing it with Qt's thread classes, not with concurrent.futures.
[1] https://doc.qt.io/qt-5/threads-synchronizing.html#high-level...
[2] https://doc.qt.io/qt-6/qcoreapplication.html#postEvent
[3] https://www.mail-archive.com/pyqt@riverbankcomputing.com/msg...
Especially when they've already been force-fed with ungodly amounts of buggy threaded code that has been mistakenly advertised as bug-free simply because nobody managed to catch the problem with a fuzzer yet (and which is more likely to expose its faults in a no-GIL environment, even though it's still fundamentally broken with a GIL)?
In many cases you can't reasonably expect better than that (https://en.wikipedia.org/wiki/Amdahl's_law). If your algorithm involves sharing "large data structures" in the first place, that's a bad sign.
Or completely rearchitect the language to have a model of automatic (in the C sense) allocation. I can't see that ever happening.
[1]: https://github.com/python/cpython/blob/3.13/Lib/collections/...
[2]: https://github.com/python/cpython/blob/3.13/Lib/typing.py#L2...
I've had cases where it took Python like 30 seconds to exit after I'd slurped a large CSV with a zillion rows into RAM. At that time, I'd dreamed of a way to tell Python not to bother free()ing any of that, just exit() and let Linux unmap RAM all at once. If you think about it, there probably aren't that many resources you actually care about individually freeing on exit. I'm certain somewill will prove me wrong, but at a first pass, objects that don't define __del__ or __exit__ probably don't care how you destroy them.
There may be another way to skin that specific cat. My point isn't to solve one specific problem, but to say that some problems are just inherently large. And with Python, today, if those workers are CPU-bound in Python-land, that means running separate processes and passing large hunks of state around (or shoving it through SHM; same idea, just a different way of passing state).
For the web/network workloads most of us write, I'd highly recommend this.