←back to thread

358 points tkgally | 1 comments | | HN request time: 0.223s | source

The use of the em dash (—) now raises suspicions that a text might have been AI-generated. Inspired by a suggestion from dang [1], I created a leaderboard of HN users according to how many of their posts before November 30, 2022—that is, before the release of ChatGPT—contained em dashes. Dang himself comes in number 2—by a very slim margin.

Credit to Claude Code for showing me how to search the HN database through Google BigQuery and for writing the HTML for the leaderboard.

[1] https://news.ycombinator.com/item?id=45053933

Show context
rasse ◴[] No.45072058[source]
How about en dash usage? Has that been used as a similar false indicator?
replies(2): >>45072074 #>>45072329 #
thomasm6m6 ◴[] No.45072329[source]
OpenAI’s o3 was big on en dashes—one time it produced a Deep Research result containing >200 of them. I’m not aware of any other LLM using them commonly, though. I’d guess humans use them even less often; I don’t think Apple auto-inserts en dashes, and very few people (myself being one) are pedantic enough to bother.

On the other hand, I don’t think o3 was ever a common choice among people copying from LLMs, so en dashes remain infrequent regardless.

replies(2): >>45072593 #>>45073510 #
aspect0545 ◴[] No.45072593[source]
In German en dashes are more common than em dashes. I’ve been using them regularly for at least 20 years, both in German and English texts. I never liked it when people just threw in ordinary hyphen instead of an en dash, but few people note the difference.
replies(1): >>45072834 #
JimDabell ◴[] No.45072834[source]
Yes, this is regional – British usage tends to be an en dash surrounded by spaces, where American usage tends to be an em dash with no spaces.
replies(1): >>45072913 #
lostlogin ◴[] No.45072913[source]
All this has me thinking. Is the em-dash like an accent for machines?
replies(1): >>45073148 #
1. JimDabell ◴[] No.45073148[source]
I’m not sure about accent, but I have described their intense overuse of certain things as a verbal tic before.