>"“Brain Rot” for LLMs isn’t just a catchy metaphor—it reframes data curation as cognitive hygiene for AI"
A metaphor is exactly what it is because not only do LLMs not possess human cognition, there's certainly no established science of thinking they're literally valid subjects for clinical psychological assessment.
How does this stuff get published, this is basically a blog post. One of the worse aspects of the whole AI craze is that is has turned a non-trivial amount of academia into a complete cargo cult joke.
I think it's intended as a catchy warning to people who are dumping every piece of the internet (and synthetic data based on it!) that there are repercussions.
- consumer marketing
- politics
- venture fundraising
When any system has a few power law winners, it makes sense to grab attention.
Look at Trump and Musk and now Altman. They figured it out.
MrBeast...
Attention, even if negative, wedges you into the system and everyone's awareness. Your mousey quiet competitors aren't even seen or acknowledged. The attention grabbers suck all the oxygen out of the room and win.
If you go back and look at any victory, was it really better solutions, or was it the fact that better solutions led to more attention?
"Look here" -> build consensus and ignore naysayers -> keep building -> feedback loop -> win
It might not just be a societal algorithm. It might be one of the universe's fundamental greedy optimization algorithms. It might underpin lots of systems, including how we ourselves as individuals think and learn.
Our pain receptors. Our own intellectual interests and hobbies. Children learning on the playground. Ant colonies. Bee swarms. The world is full of signals, and there are mechanisms which focus us on the right stimuli.
The two bits about this paper that I think are worth calling out specifically:
- A reasonable amount of post-training can't save you when your pretraining comes from a bad pipeline; ie. even if the syntactics of the input pretrained data are legitimate it has learned some bad implicit behavior (thought skipping)
- Trying to classify "bad data" is itself a nontrivial problem. Here the heuristic approach of engagement actually proved more reliable than an LLM classification of the content
The idea that LLMs are just trained on a pile of raw Internet is severely outdated. (Not sure it was ever fully true, but it's far away from that by now).
Coding's one of the easier datasets to curate, because we have a number of ways to actually (somewhat) assess code quality. (Does it work? Does it come with a set of tests and pass it? Does it have stylistic integrity? How many issues get flagged by various analysis tools? Etc, etc)
> (...) We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
The intent was for you to read my comment at face value. I have a point tangential to the discussion at hand that is additive.
LLMs trained on me (and the Hacker News corpus), not the other way around.