Seeing such clever use of mmap makes me dread to imagine how much Python spaghetti probably tanks OpenAI's and other "big ML" shops' infra when they should've trusted in zero copy solutions.
Perhaps SWE is dead after all, but LLMs didn't kill it...
Probably not all that much. All of the Python numeric computing frameworks (Numpy, PyTorch, TensorFlow, etc.) are basically just wrappers for lower level C++/C/Fortran code. Unless you’re doing something boneheaded and converting framework-native tensors to Python objects, passing tensors around within a framework essentially just passes a pointer around, which has marginal overhead even when encapsulated in a bloated Python object.
Indeed, a huge number of PyTorch operations are explicitly zero copy: https://pytorch.org/docs/stable/tensor_view.html
This leads to all sorts of bugs and breaking changes that are cool in an academic or hobbyist setting but a total headache on a large production system.
In fact I'd love to see that Transformer really dominates. We can then start to converge on software. And compute-wise transformers are really simple, too!
It sounds unnecessarily weird to me that people would share Python code that simply doesn't work out at all out of the box.
Never understood why people think that indented languages are any simpler when in fact they bring all kinds of trouble for getting things done.
If you just write straight c++ (without c++xx, or anything like it) you can compile the code on machines from decades ago if you want.
You can easily build a standalone binary (well, it would be GiB+ if you use CUDA... but that's the cost of statically linking cu*), had you coded your model and training loop in C++.
It then happily runs everywhere as long as a NVIDIA GPU driver is available (don't need to install CUDA).
Protip: Your AI research team REALLY DON'T WANT TO DO THIS BECAUSE THEY LOVE PYTHON. Having Python, even with the dependency management shit, is a feature, not a bug.
(if you want Rust / Go and don't want to wrapping libtorch/tf then you have a lot of work to do but yeah it's possible. also there are model compiler guys [1] where the promise is model.py in model.o out you just link it with your code)
[1] https://mlc.ai
Suppose you have a big piece of compute hardware (e.g. at a university) which is shared by multiple users. They all want to come in and play with these models. Each one is tens to hundreds of gigabytes. Is each user supposed to have their own copy in their home directory?
But now we encounter this broken nonsense because solved problems get unsolved by bad software.
Python is not the cause of dependency hell. Deep dependency trees are. The only way to deal with this is to use seperate environments and to carefully specify the exact requirements.
Those who claim some language would be a magical fix clearly lack experience in multiple languages.
I've been very _careful_ too (using pyenv/virtualenvs etc) for dependency management, but with Nvidia driver dependencies and "missing 'sqlite3/bz2' issues related to the underlying interpreter (not to mention issues with different Python3.x versions) I'm lucky to be able to even run a 'hello world' ML sample after an afternoon of fighting with it.
My Ubuntu install w/ Nvidia card only seems to recognize the GPU in some circumstances even when using the same `conda` env. Often this is remedied by rebooting the machine(?).
No idea how companies manage this stuff in production. Absolute minefield that seems to catastrophically break if you sneeze at it.
I'll admit I am not an expert in managing ML envs, but I've dealt with a lot of python environments for typical CRUD stuff, and while rough at times, it was never this bad.
It's like Python 2 vs Python 3 except even worse.