Although strictly speaking they have lots of information in a small package, they are F-tier compression algorithms because the loss is bad, unpredictable, and undetectable (i.e. a human has to check it). You would almost never use a transformer in place of any other compression algorithm for typical data compression uses.
...and we still can't. If your lawyer sent you your case files in the form of an LLM trained on those files, would you be comfortable with that? Where is the situation you would compress text with an LLM over a standard compression algo? (Other than to make an LLM).
Other lossy compression targets known superfluous information. MP3 removes sounds we can't really hear, and JPEG works by grouping uniform color pixels into single chunks of color.
LLM's kind of do their own thing, and the data you get back out of them is correct, incorrect, or dangerously incorrect (i.e. is plausible enough to be taken as correct), with no algorithmic way to discern which is which.
So while yes, they do compress data and you can measure it, the output of this "compression algorithm" puts in it the same family as a "randomly delete words and thesaurus long words into short words" compression algorithms. Which I don't think anyone would consider to compress their documents.
I'm not arguing that LLMs don't compress data, I am arguing that they are technically compression tools, but not colloquially compression tools, and the overlap they have with colloquial compression tools is almost zero.
I'm not making an argument about whether the compression is good or useful, just like I don't find 144p bitrate starved videos particularly useful. But it doesn't seem so unlike other types of compression to me.