LLMs can get "brain rot"

The two big problems listed:

* Thought-skipping as the primary lesion: models increasingly truncate or skip reasoning chains, explaining most of the error growth.

* Popularity as a better indicator: the popularity, a non-semantic metric, of a tweet is a better indicator of the Brain Rot effect than the length in M1.

That's what you'd expect. Popular culture content tends to jump from premise to conclusion without showing the work. Train on popular culture and you get that. Really, what's supposed to come from training on the Twitter firehose? (Can you still buy that feed? Probably not.) This is a surprise-free result.

At least have a curated model (no social media) and a junk model to compare.