←back to thread

223 points benkaiser | 1 comments | | HN request time: 0s | source
Show context
jsenn ◴[] No.42545081[source]
Has there been any serious study of exactly how LLMs store and retrieve memorized sequences? There are so many interesting basic questions here.

Does verbatim completion of a bible passage look different from generation of a novel sequence in interesting ways? How many sequences of this length do they memorize? Do the memorized ones roughly correspond to things humans would find important enough to memorize, or do LLMs memorize just as much SEO garbage as they do bible passages?

replies(2): >>42546473 #>>42547984 #
1. nwatson ◴[] No.42546473[source]
I imagine Bible passages, at least the more widely quoted and discussed ones, appear many, many times in the various available translations, in inspirational, devotional, scholarly articles, in sermon transcripts, etc. This surely reinforces almost word-for-word recall. SEO garage is a bit different each time, so common SEO-reinforced themes might be recalled in LLM output, but not word for word.