Ok I answered my own question.
Ok I answered my own question.
It’s several things:
* Cutting-edge code, not overly concerned with optimization
* Code written by scientists, who aren’t known for being the world’s greatest programmers
* The obsession the research world has with using Python
Not surprising that there’s a lot of low-hanging fruit that can be optimized.
The interface is designed to be easy to use (python) and the bit that is actually doing the work is designed to be heavily performant (which is C & CUDA and may even be running on a TPU).
You're completely correct that the speed-sensitive parts are written in lower-level libraries, but another way to phrase that is "Python can go really fast, as long as you don't use Python." But this also means ML is effectively hamstrung into only using methods that already exist and have been coded in C++, since anything in Python would be too slow to compete.
There's lots of languages that make good tradeoffs between performance and usability. Python is not one of those languages. It is, at best, only slightly harder to use than Julia, yet orders-of-magnitude slower.