←back to thread

An LLM is a lossy encyclopedia

(simonwillison.net)
509 points tosh | 1 comments | | HN request time: 0.206s | source

(the referenced HN thread starts at https://news.ycombinator.com/item?id=45060519)
1. rob_c ◴[] No.45100787[source]
Yes and working out how to disentangle the information storage mechanisms from say language processing is a massive area of interest. Only problem with Attention Transformers imo is that they're a bit too good :p

Imagine a slightly lossy compression algorithm which can store 10x, 100x the current best lossless and be able to maintain 99.999% fidelity when recalling that information. Probably, very probably a pipe dream. But why do large on device models seem to be able to remember adjust everything from Wikipedia and store that in smaller format than a direct archive of the source Material. (Look at the current best from diffusion models as well)