I’m teaching myself LLM internals by re-implementing the stack from first principles. Profiling TikToken’s Python/Rust implementation showed a lot of time was spent doing regex matching. Most of my perf gains come from a) using a faster jit-compiled regex engine; and b) simplifying the algorithm to forego regex matching special tokens at all.
Benchmarking code is included. Notable results show: - 4x faster code sample tokenization on a single thread. - 2-3x higher throughput when tested on a 1GB natural language text file.
Iteration speed trumps all in research, most of what Python does is launch GPU operations, if you're having slowdowns from Pythonland then you're doing something terribly wrong.
Python is an excellent (and yes, fast!) language for orchestrating and calling ML stuff. If C++ code is needed, call it as a module.
If you think Python is a bad language for AI integrations, try writing one in a compiled language.
ML researchers aren’t using python because they are dumb. They use it because what takes 8 lines in Java can be done with 2 or 3 (including import json) in python for example.
So great there are 8 of them. 800% better than all the rest!
> If you think Python is a bad language for AI integrations, try writing one in a compiled language.
I'll take this challenge, all day, every day, so long as I and the hypothetical 'move fast and break things' have equal "must run in prod" and "must be understandable by some other human" qualifiers
What type is `array`? Don't worry your pretty head about it, feed it whatever type you want and let Sentry's TypeError sort it out <https://github.com/openai/whisper/blob/v20250625/whisper/aud...> Oh, sorry, and you wanted to know what `pad_or_trim` returns? Well that's just, like, your opinion man
I'm still teaching them Python.