←back to thread

358 points tkgally | 8 comments | | HN request time: 0.535s | source | bottom

The use of the em dash (—) now raises suspicions that a text might have been AI-generated. Inspired by a suggestion from dang [1], I created a leaderboard of HN users according to how many of their posts before November 30, 2022—that is, before the release of ChatGPT—contained em dashes. Dang himself comes in number 2—by a very slim margin.

Credit to Claude Code for showing me how to search the HN database through Google BigQuery and for writing the HTML for the leaderboard.

[1] https://news.ycombinator.com/item?id=45053933

Show context
tptacek ◴[] No.45071905[source]
The em-dash giveaway is an actual Unicode em-dash character, right? I professionally had to learn Latex to write a paper in the 1990s and picked up a "---" habit ever since, and I've been wondering if that's some kind of weird LLM tell now.
replies(3): >>45071910 #>>45071948 #>>45072345 #
1. f33d5173 ◴[] No.45071910[source]
It's more the style of setting up contrasts that's the real llm tell. That they happen to use a typographic mark that most people don't know how to type is just fuel on the fire.
replies(4): >>45072153 #>>45072298 #>>45072695 #>>45073079 #
2. DonHopkins ◴[] No.45072153[source]
You are absolutely correct.
3. londons_explore ◴[] No.45072298[source]
Anyone who types in MS word for the improved spell checker and then copies their comment to a browser will automatically get hyphens changed to em-dashes.
replies(1): >>45074234 #
4. pxc ◴[] No.45072695[source]
Em-dashes are only incidentally related to contrasting statements like that, too. My main use of them is quasi-parenthetical interpolation. It can be nice when you want more emphasis on the aside, or just to avoid using parens or commas if you started writing something that already uses them.
replies(1): >>45075382 #
5. DiscourseFan ◴[] No.45073079[source]
The fact that its not very useful for the forms of writing most people participate in nowadays--short form responses that are heavily contextual. Even longer form writing is often labored over--people use LLMs for outdated types of communication, like long-winded emails or school papers.

Idk, working in the AI space, I've started to write very succinctly and straight to the point, maybe as a counterweight to the often overly flattering, verbose forms of prose that the LLMs employ. I pay close attention to every word and try to never write more than is necessary.

replies(1): >>45073169 #
6. michaelt ◴[] No.45073169[source]
Less words maybe good if useless filler gone.

But what if need more words for complicated idea?

Short message easy if just 'orange man good' or 'orange man bad' but what if want to explain reason also? Dumb down? What if discussion too dumb already?

7. layer8 ◴[] No.45074234[source]
This is configurable and can be turned off.
8. Terretta ◴[] No.45075382[source]
My usage is not just parentheticals—when they're used like this—it's ironically continuations — a turn the sentence takes but not really standalone.

And the continuations… Honestly? They'll never <|im_end|>.

// • Chronic option-dash and option-shift-dash user, option-[ or option-shift-[ as well as option-] and option-shift-] — not to mention option-8 and option-; …