LLMs can get "brain rot"

(llm-brain-rot.github.io)

466 points tamnd | 5 comments | 21 Oct 25 14:24 UTC | HN request time: 0.775s | source

Show context

pixelmelt ◴[21 Oct 25 15:33 UTC] No.45657074[source]▶

>>45656223 (OP) #

Isn't this just garbage in garbage out with an attention grabbing title?

replies(6): >>45657153 #>>45657205 #>>45657394 #>>45657412 #>>45657896 #>>45658420 #

wat10000 ◴[21 Oct 25 15:44 UTC] No.45657205[source]▶

>>45657074 #

Considering that the current state of the art for LLM training is to feed it massive amounts of garbage (with some good stuff alongside), it seems important to point this out even if it might seem obvious.

replies(1): >>45657247 #

CaptainOfCoit ◴[21 Oct 25 15:48 UTC] No.45657247[source]▶

>>45657205 #

I don't think anyone is throwing raw datasets into LLMs and hoping for high quality weights anymore. Nowadays most of the datasets are filtered one way or another, and some of them highly curated even.

replies(1): >>45657546 #

1. BoredPositron ◴[21 Oct 25 16:10 UTC] No.45657546[source]▶

>>45657247 #

I doubt they are highly created you would need experts in every field to do so. Which gives me more performance anxiety for LLMs because one of the most curated fields should be code...

replies(3): >>45657692 #>>45657999 #>>45659279 #

2. nradov ◴[21 Oct 25 16:21 UTC] No.45657692[source]▶

>>45657546 (TP) #

OpenAI has been literally hiring human experts in certain targeted subject areas to write custom proprietary training content.

replies(1): >>45657779 #

3. BoredPositron ◴[21 Oct 25 16:27 UTC] No.45657779[source]▶

>>45657692 #

I bet the dataset is mostly comprised of certain areas™.

4. groby_b ◴[21 Oct 25 16:46 UTC] No.45657999[source]▶

>>45657546 (TP) #

The major labs are hiring experts. They carefully build & curate synthetic data. The market for labelled non-synthetic data is currently ~$3B/year.

The idea that LLMs are just trained on a pile of raw Internet is severely outdated. (Not sure it was ever fully true, but it's far away from that by now).

Coding's one of the easier datasets to curate, because we have a number of ways to actually (somewhat) assess code quality. (Does it work? Does it come with a set of tests and pass it? Does it have stylistic integrity? How many issues get flagged by various analysis tools? Etc, etc)

5. satellite2 ◴[21 Oct 25 18:04 UTC] No.45659279[source]▶

>>45657546 (TP) #

Is that right? Isn't the current way of doing thing to throw "everything" at it then fine tune?

↑