I love how LLMs have got the attention of proper programmers such that the Python mess is getting cleaned up.
C has an almost infinite horizon for optimization. Python is good prototypes but we are beyond that stage now
99% of LLM evaluation with PyTorch was already done in C++.
These .cpp projects don't improve anything for performance. They just drop dependencies necessary for training and experimentation.
Optimization isn't just about speed. As you said, dropping dependencies makes it portable, embeddable, more versatile
It's also nice to not lose your mind over how crazy Python and Docker are, when all you want to do is run inference in a shell script as though it were the `cat` command. That sacred cow is going to have to come out of the temple sooner or later, and when that happens, people are going to think, wow, it's just a cow.