←back to thread

LLMs can get "brain rot"

(llm-brain-rot.github.io)

466 points tamnd | 2 comments | 21 Oct 25 14:24 UTC | HN request time: 0.509s | source

Show context

pixelmelt ◴[21 Oct 25 15:33 UTC] No.45657074[source]▶

>>45656223 (OP) #

Isn't this just garbage in garbage out with an attention grabbing title?

replies(6): >>45657153 #>>45657205 #>>45657394 #>>45657412 #>>45657896 #>>45658420 #

1. icyfox ◴[21 Oct 25 16:38 UTC] No.45657896[source]▶

Yes - garbage in / garbage out still holds true for most things when it comes to LLM training.

The two bits about this paper that I think are worth calling out specifically:

- A reasonable amount of post-training can't save you when your pretraining comes from a bad pipeline; ie. even if the syntactics of the input pretrained data are legitimate it has learned some bad implicit behavior (thought skipping)

- Trying to classify "bad data" is itself a nontrivial problem. Here the heuristic approach of engagement actually proved more reliable than an LLM classification of the content

replies(1): >>45659231 #

2. satellite2 ◴[21 Oct 25 18:01 UTC] No.45659231[source]▶

>>45657896 (TP) #

Yes but the other interesting bit which is not clearly addressed is that increasing the garbage in to 100% does not result in absolute garbage out. So visibly there is still something to learn there.