←back to thread

LLMs can get "brain rot"

(llm-brain-rot.github.io)
466 points tamnd | 3 comments | | HN request time: 0.724s | source
Show context
avazhi ◴[] No.45658886[source]
“Studying “Brain Rot” for LLMs isn’t just a catchy metaphor—it reframes data curation as cognitive hygiene for AI, guiding how we source, filter, and maintain training corpora so deployed systems stay sharp, reliable, and aligned over time.”

An LLM-written line if I’ve ever seen one. Looks like the authors have their own brainrot to contend with.

replies(12): >>45658899 #>>45660532 #>>45661492 #>>45662138 #>>45662241 #>>45664417 #>>45664474 #>>45665028 #>>45668042 #>>45670485 #>>45670910 #>>45671621 #
standardly ◴[] No.45660532[source]
That is indeed an LLM-written sentence — not only does it employ an em dash, but also lists objects in a series — twice within the same sentence — typical LLM behavior that renders its output conspicuous, obvious, and readily apparent to HN readers.
replies(15): >>45660603 #>>45660625 #>>45660648 #>>45660736 #>>45660769 #>>45660781 #>>45660816 #>>45662051 #>>45664698 #>>45665777 #>>45666311 #>>45667269 #>>45670534 #>>45678811 #>>45687737 #
kragen ◴[] No.45667269[source]
I've been doing that for decades. See for example https://www.mail-archive.com/kragen-tol@canonical.org/msg000...:

> Many programming languages provide an exception facility that terminates subroutines without warning; although they usually provide a way to run cleanup code during the propagation of the exception (finally in Java and Python, unwind-protect in Common Lisp, dynamic-wind in Scheme, local variable destructors in C++), this facility tends to have problems of its own --- if cleanup code run from it raises an exception, one exception or the other, or both, will be lost, and the rest of the cleanup code at that level will fail to run.

I wasn't using Unicode em dashes at the time but TeX em dashes, but I did switch pretty early on.

You can easily find human writers employing em dashes and comma-separated lists over several centuries.

replies(6): >>45667337 #>>45667347 #>>45667909 #>>45668660 #>>45669927 #>>45670247 #
_AzMoo ◴[] No.45667337[source]
Which is exactly why LLMs use these techniques so often. They're very common.
replies(1): >>45667383 #
1. kragen ◴[] No.45667383[source]
Well, em dashes are not all that common in text that people have written on computers, because em dashes were left out of ASCII. They're common in high-quality text like Wikipedia, academic papers, and published books.

My guess is that comma-separated lists tend to be a feature of text that is attempting to be either comprehensively expository—listing all the possibilities, all the relevant factors, etc.—or persuasive—listing a compelling set of examples or other supporting arguments so that at least one of them is likely to convince the reader.

replies(1): >>45669712 #
2. danielhughes ◴[] No.45669712[source]
I was surprised to learn from your comment that em dashes were left out of ASCII, because I thought I've been using them extensively in my writing. Perhaps I'm just relying heavily on the hyphen key. I mention that because it's likely instances of true em dash use (e.g. in the high-quality text you cite) and hyphen usage by people like me are close enough together in a vector space that the general pattern of a little horizontal line in the middle of a sentence is perceived as a common writing style by the LLMs.

I find myself constantly editing my natural writing style to sound less like an AI so this discussion of em dash use is a sore spot. Personally I think many people overrate their ability to recognize AI-generated copy without a good feedback loop of their own false positives (or false negatives for that matter).

replies(1): >>45670628 #
3. kragen ◴[] No.45670628[source]
On typewriters all characters are the same width, typically about ½em wide. Some of them compromised their hyphen so that you could join two of them together to form an em dash, but a good hyphen is closer to ¼em wide. But that compromise also meant that a single hyphen would work very well as an en dash. And generally hyphenation was not very important for typewriters because you couldn't produce properly justified text on a typewriter anyway, not without carefully preplanning each line before you began to type it.

Computers unfortunately inherited a lot of this typewriter crap.

Related compromises included having only a single " character; shaping it so that it could serve as a diaeresis if overstruck; shaping some apostrophes so that they could serve as either left or write single quotes and also form a decent ! if overstruck with a .; alternatively, shaping apostrophe so that it could serve as an acute accent if overstruck, and providing a mirror-image left-quote character that doubled as a grave accent; and shaping the lowercase "l" as a viable digit "1", which more or less required the typewriter as a whole to use lining figures rather than the much nicer text figures.