The answer is no, it will not. Instead they'll just keep adding more and more syntax. And more and more ways to do the same old things. And they'll say that if you want "fast" then write a native module that we can import and use.
So then what's the point? Is Python really just a glue language like all the rest?
If your only metric for a language is speed then nothing really beats hand crafted assembly. All this memory safety at runtime is just overhead. If you also consider language ergonomics, Python suddenly is not a bad choice at all.
Do check out the articles in the top most comment.. about how tail call optimization gets you faster interpreters.
It completely eliminates the overhead of function calls in the generated machine code while you still your code modularly using functions.
A lambda can be as big of an expression as you want, including spanning multiple lines; it can't (because it is an expression) include statements, which is only different than lambdas in most functional languages in that Python actually has statements.
https://www.youtube.com/watchv=qCGofLIzX6g
One case study Ronacher gets into is the torturous path taken through the Python interpreter (runtime?) when you evaluate `__add__`. Fascinating stuff.
Also, free-threading is coming so we'll have threads soon.
I don't know if Python can every really be fast as by design, objects are scattered all over memoryand even things like iterating a list, you're chasing pointers to PyObject all over the place - it's just not cache friendly.
It should only be called Tail Call Elimination.
I published this technique four years ago, and it's very exciting to see that others have taken up the cause and done the work to land it in CPython.
Moreover, Guido is in favour of ongoing addition of major new features (like pattern matching), worrying that without them Python would become a “legacy language”:
https://discuss.python.org/t/pep-8012-frequently-asked-quest...
Not to rust, but to Go and C++ for myself. The biggest motivating factor is deployment ease. It is so difficult to offer a nice client install process when large virtual environments are involved. Static executables solve so many painpoints for me in this arena. Rust would probably shine here as well.
If its for some internal bespoke process, I do enjoy using Python. For tooling shipped to client environments, I now tend to steer clear of it.
Only if you know the micro-architecture of the processor you are running on at great depth and can schedule the instructions accordingly. Modern compilers and vms can do crazy stuff at this level.
> Python is fast enough for a whole set of problems AND it is a pretty, easy to read and write language.
It is definitely easy to read. But speed is debatable. It is slow enough for my workload to start wondering about moving to pypy.
It seems like it solves the same problem (saving the function call overhead) and has the same downsides (requires non-standard compiler extensions)
EDIT: it seems the answer is that compilers do not play well with direct-threaded interpreters and they are able to perform more/better optimizations when looking at normal-sized functions rather than massive blocks
That doesn't follow. This isn't like going from driving a car to flying an airplane. It's like going from driving a car to just teleporting instantly. (Except it's about space rather than time.)
It's a difference in degree (optimization), yes, but by a factor of infinity (O(n) overhead to 0 overhead). At that point it's not unreasonable to consider it a difference in kind (semantics).
The Python community has since matured and realised that what they previously thought of as "one thing" were actually multiple different things with small nuances and it makes sense to support several of them for different use cases.
for (int i = 0; i < n; i++) a += i;
To:
a += n * (n+1) / 2;
Is this an optimisation or a change in program semantics? I've never heard anyone call it anything slse than an optimisation.
I don't follow python closely so it may 100% be stuff that GvR endorsed too, or I'm mixing up the timelines. It just feels to me that python is changing much faster than it did in the 2.x days.
> Is this an optimisation or a change in program semantics?
Note that I specifically said something can be both an optimization and a change in semantics. It's not either-or.
However, it all depends on how the program semantics are defined. They are defined by the language specifications. Which means that in your example, it's by definition not a semantic change, because it occurs under the as-if rule, which says that optimizations are allowed as long as they don't affect program semantics. In fact, I'm not sure it's even possible to write a program that would be guaranteed to distinguish them based purely on the language standard. Whereas with tail recursion it's trivial to write a program that will crash without tail recursion but run arbitrarily long with it.
We do have at least one optimization that is permitted despite being prohibited by the as-if rule: return-value optimization (RVO). People certainly consider that a change in semantics, as well as an optimization.
Nowadays (for about 12 years already I think) there is nothing much stackless about it.
The concept was nice. Stackless and greenlets.. yess. But the way they rewrote C stack just killed caches. Even a naive reimplementation just using separate mmapped stacks and wrapping the whole coro concept under then-Python's threading module instantly gained like 100x speedup on context switch loads like serving small stuff over HTTP.
Edit: Though at this point it didn't much differ from ye olde FreeBSD N:M pthread implementation. Which ended badly if anyone can remember.
An optimization that speeds a program by x2 has the same effect as running on a faster CPU. An optimization that packs things tighter into memory has the same effect as using more memory.
Program semantics are usually referred to as “all output given all input, for any input configuration” but ignoring memory use or CPU time, provided they are both finite (but not limited).
TCE easily converts a program that will halt, regardless of available memory, to one that will never halt, regardless of available memory. That’s a big change in both theoretical and practical semantics.
I probably won’t argue that a change that reduces an O(n^5) space/time requirement to an O(1) requirement is a change in semantics, even though it practically is a huge change. But TCE changes a most basic property of a finite memory Turing machine (halts or not).
We don’t have infinite memory Turing machines.
edited: Turing machine -> finite memory Turing machine.
Python's problem is that the non-new stuff is not always backwards compatible. It happens way too often that A new python version comes out and half the python programs on my system just stop working.
https://blog.reverberate.org/2021/04/21/musttail-efficient-i...
If you look at the feature in detail, and especially how it clashes with the rest of the language, it's awful. For example:
A guy on r/WritingWithAI is building a new writing assistant tool using python and pyQt. He is not a SE by trade. Even so, the installation instructions are:
- Install Python from the Windows app store
- Windows + R -> cmd -> pip install ...
- Then run python main.py
This is fine for technical people. Not regular folks.
For most people, these incantations to be typed as-is in a black window mean nothing and it is a terrible way of delivering a piece of software to the end-user.
(More here: https://noelwelsh.com/posts/understanding-vm-dispatch/)
It's not a downside if:
(a) you have those non-standard compiler extensions in the platforms you target
(c) for the rest, you can ifdef an alternative that doesn't require them
Maybe it is only a thing to those of us already damaged with C++, and with enough years experience using it, but there are still plenty of such folks around to matter, specially to GPU vendors, and compiler writers.
Space/time requirements aren't language semantics though, are they?
With this kind of "benign" change, all programs that worked before still work, and some that didn't work before now work. I would argue this is a good thing.
Python's thirty years of evolution really shows at this point.
Given that one of the fundamental rules of programming is "don't use magic numbers, prefer named constants", that's terrible language design.
But I think you can get a fine balance by keeping a recent call trace (in a ring buffer?). Lua does this and honestly it's OK, once you get used to the idea that you're not looking at stack frames, but execution history.
IMHO Python should add that, and it should clearly distinguish between which part of a crash log is a stack trace, and which one is a trace of tail calls.
Either way this is going to be quite a drastic change.
At least in my case I use it all the time, to avoid duplicated operations inside comprehensions.
PyPy is a valid option and one I would explore if it fits what you are doing.
I think part of the reason Guido stepped down was that the BDFL structure created too much load on him dealing with actual and potential change, so its unsurprising that the rate of change increased when the governance structure changed to one that managed change without imposing the same load on a particular individual.
> they are able to perform more/better optimizations when looking at normal-sized functions rather than massive blocks
It's not the size of the function that is the primary problem, it is the fully connected control flow that gums everything up. The register allocator is trying to dynamically allocate registers through each opcode's implementation, but it also has to connect the end of every opcode with the beginning of every opcode, from a register allocation perspective.
The compiler doesn't understand that every opcode has basically the same set of "hot" variables, which means we benefit from keeping those hot variables in a fixed set of registers basically all of the time.
With tail calls, we can communicate a fixed register allocation to the compiler through the use of function arguments, which are always passed in registers. When we pass this hot data in function arguments, we force the compiler to respect this fixed register allocation, at least at the beginning and the end of each opcode. Given that constraint, the compiler will usually do a pretty good job of maintaining that register allocation through the entire function.
Python dicts were in insert sort order for 3.6 but this only became a garuntee as opposed to an implementation choice that could be changed at anyvtime with python3.7
I've been looking at Rust and it's a big improvement over C, but it still strikes me as a work in progress, and its attitude is less paranoid than that of Ada. I'd at least like to see options to crank up the paranoia level. Maybe Ada itself will keep adapting too. Ada is clunky, but it is way more mature than Rust.