←back to thread

358 points tkgally | 1 comments | | HN request time: 0.24s | source

The use of the em dash (—) now raises suspicions that a text might have been AI-generated. Inspired by a suggestion from dang [1], I created a leaderboard of HN users according to how many of their posts before November 30, 2022—that is, before the release of ChatGPT—contained em dashes. Dang himself comes in number 2—by a very slim margin.

Credit to Claude Code for showing me how to search the HN database through Google BigQuery and for writing the HTML for the leaderboard.

[1] https://news.ycombinator.com/item?id=45053933

Show context
Lockal ◴[] No.45078291[source]
I think a bit more interesting statistics is to count only \w—\w. This excludes cases like "(—)" and emdashes surrounded by spaces, which is, apparently, what Russian-speaking users like to use. Also it is an very old tradition to format page titles as <title>[Page name] — [Website name]</title>: depending on language this is a default setting for MediaWiki, WordPress, etc.
replies(1): >>45078565 #
n2d4 ◴[] No.45078565[source]
Not just Russian speakers put spaces around the emdash, but also the AP style guide.

Also, for what it's worth, UK style guides recommend endash + spaces (but many write emdash + spaces instead), and so do some other languages (eg. German). There are more countries than just America and Russia!

replies(1): >>45080423 #
1. Lockal ◴[] No.45080423[source]
No, I mean in few Slavic languages emdash is replaces "is a / ist / est / es / ...", therefore you will see it in 99% of ru/be/uk Wikipedia articles *in the first sentence*. Coincidentally, in these languages emdash must be surrounded by spaces (no exceptions).