Show HN: Hacker News em dash user leaderboard pre-ChatGPT

(www.gally.net)

The use of the em dash (—) now raises suspicions that a text might have been AI-generated. Inspired by a suggestion from dang [1], I created a leaderboard of HN users according to how many of their posts before November 30, 2022—that is, before the release of ChatGPT—contained em dashes. Dang himself comes in number 2—by a very slim margin.

Credit to Claude Code for showing me how to search the HN database through Google BigQuery and for writing the HTML for the leaderboard.

[1] https://news.ycombinator.com/item?id=45053933

Show context

tptacek ◴[30 Aug 25 04:21 UTC] No.45071905[source]▶

>>45071722 (OP) #

The em-dash giveaway is an actual Unicode em-dash character, right? I professionally had to learn Latex to write a paper in the 1990s and picked up a "---" habit ever since, and I've been wondering if that's some kind of weird LLM tell now.

replies(3): >>45071910 #>>45071948 #>>45072345 #

1. f33d5173 ◴[30 Aug 25 04:23 UTC] No.45071910[source]▶

>>45071905 #

It's more the style of setting up contrasts that's the real llm tell. That they happen to use a typographic mark that most people don't know how to type is just fuel on the fire.

replies(4): >>45072153 #>>45072298 #>>45072695 #>>45073079 #

2. DonHopkins ◴[30 Aug 25 05:43 UTC] No.45072153[source]▶

>>45071910 (TP) #

You are absolutely correct.

3. londons_explore ◴[30 Aug 25 06:11 UTC] No.45072298[source]▶

>>45071910 (TP) #

Anyone who types in MS word for the improved spell checker and then copies their comment to a browser will automatically get hyphens changed to em-dashes.

replies(1): >>45074234 #

4. pxc ◴[30 Aug 25 07:39 UTC] No.45072695[source]▶

>>45071910 (TP) #

Em-dashes are only incidentally related to contrasting statements like that, too. My main use of them is quasi-parenthetical interpolation. It can be nice when you want more emphasis on the aside, or just to avoid using parens or commas if you started writing something that already uses them.

replies(1): >>45075382 #

5. DiscourseFan ◴[30 Aug 25 08:53 UTC] No.45073079[source]▶

>>45071910 (TP) #

The fact that its not very useful for the forms of writing most people participate in nowadays--short form responses that are heavily contextual. Even longer form writing is often labored over--people use LLMs for outdated types of communication, like long-winded emails or school papers.

Idk, working in the AI space, I've started to write very succinctly and straight to the point, maybe as a counterweight to the often overly flattering, verbose forms of prose that the LLMs employ. I pay close attention to every word and try to never write more than is necessary.

replies(1): >>45073169 #

6. michaelt ◴[30 Aug 25 09:13 UTC] No.45073169[source]▶

>>45073079 #

Less words maybe good if useless filler gone.

But what if need more words for complicated idea?

Short message easy if just 'orange man good' or 'orange man bad' but what if want to explain reason also? Dumb down? What if discussion too dumb already?

7. layer8 ◴[30 Aug 25 12:56 UTC] No.45074234[source]▶

>>45072298 #

This is configurable and can be turned off.

8. Terretta ◴[30 Aug 25 15:18 UTC] No.45075382[source]▶

>>45072695 #

My usage is not just parentheticals—when they're used like this—it's ironically continuations — a turn the sentence takes but not really standalone.

And the continuations… Honestly? They'll never <|im_end|>.

// • Chronic option-dash and option-shift-dash user, option-[ or option-shift-[ as well as option-] and option-shift-] — not to mention option-8 and option-; …

↑