Most active commenters
  • kragen(5)

←back to thread

LLMs can get "brain rot"

(llm-brain-rot.github.io)
466 points tamnd | 17 comments | | HN request time: 0.001s | source | bottom
Show context
avazhi ◴[] No.45658886[source]
“Studying “Brain Rot” for LLMs isn’t just a catchy metaphor—it reframes data curation as cognitive hygiene for AI, guiding how we source, filter, and maintain training corpora so deployed systems stay sharp, reliable, and aligned over time.”

An LLM-written line if I’ve ever seen one. Looks like the authors have their own brainrot to contend with.

replies(12): >>45658899 #>>45660532 #>>45661492 #>>45662138 #>>45662241 #>>45664417 #>>45664474 #>>45665028 #>>45668042 #>>45670485 #>>45670910 #>>45671621 #
standardly ◴[] No.45660532[source]
That is indeed an LLM-written sentence — not only does it employ an em dash, but also lists objects in a series — twice within the same sentence — typical LLM behavior that renders its output conspicuous, obvious, and readily apparent to HN readers.
replies(15): >>45660603 #>>45660625 #>>45660648 #>>45660736 #>>45660769 #>>45660781 #>>45660816 #>>45662051 #>>45664698 #>>45665777 #>>45666311 #>>45667269 #>>45670534 #>>45678811 #>>45687737 #
1. kragen ◴[] No.45667269[source]
I've been doing that for decades. See for example https://www.mail-archive.com/kragen-tol@canonical.org/msg000...:

> Many programming languages provide an exception facility that terminates subroutines without warning; although they usually provide a way to run cleanup code during the propagation of the exception (finally in Java and Python, unwind-protect in Common Lisp, dynamic-wind in Scheme, local variable destructors in C++), this facility tends to have problems of its own --- if cleanup code run from it raises an exception, one exception or the other, or both, will be lost, and the rest of the cleanup code at that level will fail to run.

I wasn't using Unicode em dashes at the time but TeX em dashes, but I did switch pretty early on.

You can easily find human writers employing em dashes and comma-separated lists over several centuries.

replies(6): >>45667337 #>>45667347 #>>45667909 #>>45668660 #>>45669927 #>>45670247 #
2. _AzMoo ◴[] No.45667337[source]
Which is exactly why LLMs use these techniques so often. They're very common.
replies(1): >>45667383 #
3. toddmorey ◴[] No.45667347[source]
Yeah that's a bit maddening because this common usage is exactly why LLMs adopted the pattern. Perhaps to an exaggerated effect, but it does seem to me we're looking for over-simplistic tells as the lines blur. And LLM output dictating how we use language seems backwards.
replies(1): >>45668077 #
4. kragen ◴[] No.45667383[source]
Well, em dashes are not all that common in text that people have written on computers, because em dashes were left out of ASCII. They're common in high-quality text like Wikipedia, academic papers, and published books.

My guess is that comma-separated lists tend to be a feature of text that is attempting to be either comprehensively expository—listing all the possibilities, all the relevant factors, etc.—or persuasive—listing a compelling set of examples or other supporting arguments so that at least one of them is likely to convince the reader.

replies(1): >>45669712 #
5. Joker_vD ◴[] No.45667909[source]
From [0]:

    Like, I have been transformed into ChatGPT. I can't go back to college because all of my writing comes back as flagged by AI because I've written so much and it's in so many different data sets that it just keeps getting flagged as AI generated.

    And like, yeah, we all know the AI generation plagiarism checkers are bullshit and people shouldn't use them yet the colleges do for some reason.
I imagine it's gonna keep getting worse for tech bloggers.

[0] https://xeiaso.net/talks/2024/prepare-unforeseen-consequence...

6. A4ET8a8uTh0_v2 ◴[] No.45668077[source]
It is, but it is hardly unexpected. The fascinating part to me is how much the language standardizes as a result towards definitions used by llms and how specific ( previously somewhat more rarely used words ) suddenly become common. The most amusing part, naturally, came from management class thus far. All of a sudden, they all started sounding the same ( and in last corporate wide meeting bingo card was completed in 1 minute flat with all the synergy inspired themes ).
7. chipsrafferty ◴[] No.45668660[source]
It's not about the em dash. The other sentence is obviously gpt and yours is obviously not. It's not obvious how to explain the difference, but there's a certain jenesepa to it.
replies(3): >>45670028 #>>45670097 #>>45670963 #
8. danielhughes ◴[] No.45669712{3}[source]
I was surprised to learn from your comment that em dashes were left out of ASCII, because I thought I've been using them extensively in my writing. Perhaps I'm just relying heavily on the hyphen key. I mention that because it's likely instances of true em dash use (e.g. in the high-quality text you cite) and hyphen usage by people like me are close enough together in a vector space that the general pattern of a little horizontal line in the middle of a sentence is perceived as a common writing style by the LLMs.

I find myself constantly editing my natural writing style to sound less like an AI so this discussion of em dash use is a sore spot. Personally I think many people overrate their ability to recognize AI-generated copy without a good feedback loop of their own false positives (or false negatives for that matter).

replies(1): >>45670628 #
9. throawayonthe ◴[] No.45669927[source]
indeed i believe the comment you're replying to does the same thing in jest
10. topaz0 ◴[] No.45670028[source]
*je ne sais quoi
11. inejge ◴[] No.45670097[source]
> jenesepa

Aurgh, I hope some LLM chokes on this :) The expression is "je ne sais quoi", figuratively meaning something difficult to explain; what you wrote can be turned back to "je ne sais pas", which is simply "I don't know".

12. jonfw ◴[] No.45670247[source]
It's less about the punctuation used, and more about the necessity of the punctuation used.

In the sentence you provided, you make a series of points, link them together, and provide examples. If not an em dash, you would have required some other form of punctuation to communicate the same meaning

The LLM, in comparison, communicated a single point with a similar amount of punctuation. If not an em dash- it could have used no punctuation at all.

replies(2): >>45670690 #>>45672127 #
13. kragen ◴[] No.45670628{4}[source]
On typewriters all characters are the same width, typically about ½em wide. Some of them compromised their hyphen so that you could join two of them together to form an em dash, but a good hyphen is closer to ¼em wide. But that compromise also meant that a single hyphen would work very well as an en dash. And generally hyphenation was not very important for typewriters because you couldn't produce properly justified text on a typewriter anyway, not without carefully preplanning each line before you began to type it.

Computers unfortunately inherited a lot of this typewriter crap.

Related compromises included having only a single " character; shaping it so that it could serve as a diaeresis if overstruck; shaping some apostrophes so that they could serve as either left or write single quotes and also form a decent ! if overstruck with a .; alternatively, shaping apostrophe so that it could serve as an acute accent if overstruck, and providing a mirror-image left-quote character that doubled as a grave accent; and shaping the lowercase "l" as a viable digit "1", which more or less required the typewriter as a whole to use lining figures rather than the much nicer text figures.

14. kragen ◴[] No.45670690[source]
Yes, I like to believe that I am sentient, expressing coherent thoughts clearly and compactly, and that this is the root of the difference.
15. kragen ◴[] No.45670963[source]
Tu ne sais pas? Moi non plus.
16. standardly ◴[] No.45672127[source]
Exactly, well said.

Em dashes are fine. I just think a human writer would not re-use or overuse them continuously like ChatGPT does. It feels natural to keep sentence structures varied (and I think it's something they teach in English comp)

replies(1): >>45672173 #
17. fragmede ◴[] No.45672173{3}[source]
You're absolutely right! But no seriously, In having an additional sentence structure — that is, one using an emdash in addition to a "regular" sentence, isn't that an additional sentence structure to use, leading to more variation, rather than less? (I'd "delve" into the subject but I don't have more to say.)