Although strictly speaking they have lots of information in a small package, they are F-tier compression algorithms because the loss is bad, unpredictable, and undetectable (i.e. a human has to check it). You would almost never use a transformer in place of any other compression algorithm for typical data compression uses.
In one view, you can view LLMs as SOTA lossless compression algorithms, where the number of weights don’t count towards the description length. Sounds crazy but it’s true.
Compressing a comprehensive command line reference via model might introduce errors and drop some options.
But for many people, especially new users, referencing commands, and getting examples, via a model would delivers many times the value.
Lossy vs. lossless are fundamentally different, but so are use cases.
and his last before departing for Meta Superintelligence https://www.youtube.com/live/U-fMsbY-kHY?si=_giVEZEF2NH3lgxI...