←back to thread

358 points tkgally | 1 comments | | HN request time: 0s | source

The use of the em dash (—) now raises suspicions that a text might have been AI-generated. Inspired by a suggestion from dang [1], I created a leaderboard of HN users according to how many of their posts before November 30, 2022—that is, before the release of ChatGPT—contained em dashes. Dang himself comes in number 2—by a very slim margin.

Credit to Claude Code for showing me how to search the HN database through Google BigQuery and for writing the HTML for the leaderboard.

[1] https://news.ycombinator.com/item?id=45053933

Show context
astahlx ◴[] No.45072529[source]
I started using emdashes in my academic career, after my advisor pointed me to the subtle differences. And since then, I like and use emdash a lot. In Latex, it is easily produced, just keep the spacing rules in mind. The Punctuation Guide is a nice reference on it https://www.thepunctuationguide.com/
replies(1): >>45072663 #
globular-toast ◴[] No.45072663[source]
There are actually four different "dashes" in La/TeX. The hyphen (-), en-dash (--) which is used for numeric rangen like 1--2, the em-dash (---) for punctuation, and the minus sign ($-$). Knuth talks about them in the TeXbook which is good fun.
replies(1): >>45072713 #
pxc ◴[] No.45072713[source]
I think you can do all of those in plain text as well. There are Unicode characters for those dashes and probably more
replies(1): >>45073067 #
globular-toast ◴[] No.45073067[source]
Not in ASCII. My definition of plain text is roughly "the characters I have on my keyboard". Unicode is like a superset of all possible plain texts. Useful, but I really don't like my own files containing characters I can't (easily) type. If I regularly typed in another language I would acquire a keyboard for that language. I'm not even convinced typographical symbols like various dash types even belong in Unicode at all to be honest. It seems like you have to draw a very arbitrary line somewhere.
replies(1): >>45073134 #
Symbiote ◴[] No.45073134[source]
Drawing the line at "OK-ish for American English" is far too restrictive.

You can't write CO₂ or m², use a fraction like ½, claim © or mention a price in Euros or Pounds Sterling.

You can't even write major American place names (San José, Oʻahu).

replies(2): >>45077627 #>>45077657 #
pxc ◴[] No.45077627{3}[source]
I'm pretty sure © and ½ are in ASCII. I think é might be, too.

But anyway, I agree: there's no reason plain text shouldn't be rich.

replies(1): >>45078795 #
1. JdeBP ◴[] No.45078795{4}[source]
Wherever you learned ASCII from, it was very wrong. It probably made the common (although less common in the 21st century than in the 20th) erroneous conflation of ASCII and Latin-1, or IBM code page 437, or IBM code page 850.