Essential Coding Theory [pdf]

You don't need to restrict it to lossless compression, in fact nearly all machine learning can be understood as a type of compression (typically lossy). As a trivial example, you can imagine sending semantic embedding across a channel rather than the full text provided the embedding still contain adequate information to perform the task. Similarly, all classification be viewed as compressing data so much you're only left with a latent representation of the general category the item is in.

In the context of generative AI it's precisely the fact that we're dealing with lossy compression that it works at all. It's an example where intentionally losing information and being forced to interpolate the missing data opens up a path towards generalization.

Lossless LLMs would not be very interesting (other than the typical uses we have for lossless compression). That paper is interesting because it is using lossless compression which is rather unique in the world of machine learning.