(benkaiser.dev)

223 points benkaiser | 1 comments | 29 Dec 24 03:44 UTC | HN request time: 0s | source

Show context

ks2048 ◴[29 Dec 24 07:09 UTC] No.42538257[source]▶

This is interesting. I'm curious about how much (and what) these LLMs memorize verbatim.

Does anyone know any more thorough papers on this topic? For example, this could be tested on every verse in bible and lots of other text that is certainly in the training data: books in project gutenberg, wikipedia articles, etc.

Edit: this (and its references) looks like a good place to start: https://arxiv.org/abs/2407.17817v1

replies(2): >>42542876 #>>42543461 #

1. int_19h ◴[29 Dec 24 20:41 UTC] No.42542876[source]▶

>>42538257 #

For one anecdotal data point, GPT-4 knows the "navy SEAL copypasta" verbatim. It can reproduce it complete with all the original typos and misspellings, and it can recognize it from the first sentence.

↑

Can LLMs accurately recall the Bible?