←back to thread

358 points tkgally | 2 comments | | HN request time: 0.568s | source

The use of the em dash (—) now raises suspicions that a text might have been AI-generated. Inspired by a suggestion from dang [1], I created a leaderboard of HN users according to how many of their posts before November 30, 2022—that is, before the release of ChatGPT—contained em dashes. Dang himself comes in number 2—by a very slim margin.

Credit to Claude Code for showing me how to search the HN database through Google BigQuery and for writing the HTML for the leaderboard.

[1] https://news.ycombinator.com/item?id=45053933

1. Havoc ◴[] No.45077680[source]
Confused by the year stats below - that shows an increase much earlier that say GPT3 release date. So I'm guessing whatever is going on isn't just AI?
replies(1): >>45077743 #
2. gardnr ◴[] No.45077743[source]
From my perspective: that's the point of the web toy. It shows who was using these em dashes before they were likely copied and pasted from ChatGPT (or generated from APIs). The em dash is widely identified as a single character that highly increases the "smell" of text as being generated by AI.

It is novel to see which users were producing text with an em dash before the rise of AI slop. User 'derefr' was 5 years ahead of everyone.

I do wonder if there was some journalism CMS involved, or if these users figured out how to produce the character on their own volition.

EDIT: 'lynndotpy' has an explanation in this thread.