1 points moondistance | 3 comments | | HN request time: 0.788s | source
1. semessier ◴[] No.45757821[source]
Hence lossless does not seem plausible
replies(1): >>45757877 #
2. yorwba ◴[] No.45757877[source]
Importantly, they're talking about continuous representations, i.e. the output logits. For there to be a loss, you'd need two different tokens to produce the exact same logits, which is even less plausible. But as soon as you sample discrete output tokens from the distribution defined by the logits, you do end up losing information. So the practical relevance of this paper is somewhat limited.