←back to thread

LLMs can get "brain rot"

(llm-brain-rot.github.io)
466 points tamnd | 2 comments | | HN request time: 0.509s | source
Show context
pixelmelt ◴[] No.45657074[source]
Isn't this just garbage in garbage out with an attention grabbing title?
replies(6): >>45657153 #>>45657205 #>>45657394 #>>45657412 #>>45657896 #>>45658420 #
1. icyfox ◴[] No.45657896[source]
Yes - garbage in / garbage out still holds true for most things when it comes to LLM training.

The two bits about this paper that I think are worth calling out specifically:

- A reasonable amount of post-training can't save you when your pretraining comes from a bad pipeline; ie. even if the syntactics of the input pretrained data are legitimate it has learned some bad implicit behavior (thought skipping)

- Trying to classify "bad data" is itself a nontrivial problem. Here the heuristic approach of engagement actually proved more reliable than an LLM classification of the content

replies(1): >>45659231 #
2. satellite2 ◴[] No.45659231[source]
Yes but the other interesting bit which is not clearly addressed is that increasing the garbage in to 100% does not result in absolute garbage out. So visibly there is still something to learn there.