←back to thread

358 points tkgally | 1 comments | | HN request time: 0s | source

The use of the em dash (—) now raises suspicions that a text might have been AI-generated. Inspired by a suggestion from dang [1], I created a leaderboard of HN users according to how many of their posts before November 30, 2022—that is, before the release of ChatGPT—contained em dashes. Dang himself comes in number 2—by a very slim margin.

Credit to Claude Code for showing me how to search the HN database through Google BigQuery and for writing the HTML for the leaderboard.

[1] https://news.ycombinator.com/item?id=45053933

Show context
astahlx ◴[] No.45072529[source]
I started using emdashes in my academic career, after my advisor pointed me to the subtle differences. And since then, I like and use emdash a lot. In Latex, it is easily produced, just keep the spacing rules in mind. The Punctuation Guide is a nice reference on it https://www.thepunctuationguide.com/
replies(1): >>45072663 #
globular-toast ◴[] No.45072663[source]
There are actually four different "dashes" in La/TeX. The hyphen (-), en-dash (--) which is used for numeric rangen like 1--2, the em-dash (---) for punctuation, and the minus sign ($-$). Knuth talks about them in the TeXbook which is good fun.
replies(1): >>45072713 #
pxc ◴[] No.45072713[source]
I think you can do all of those in plain text as well. There are Unicode characters for those dashes and probably more
replies(1): >>45073067 #
globular-toast ◴[] No.45073067[source]
Not in ASCII. My definition of plain text is roughly "the characters I have on my keyboard". Unicode is like a superset of all possible plain texts. Useful, but I really don't like my own files containing characters I can't (easily) type. If I regularly typed in another language I would acquire a keyboard for that language. I'm not even convinced typographical symbols like various dash types even belong in Unicode at all to be honest. It seems like you have to draw a very arbitrary line somewhere.
replies(1): >>45073134 #
Symbiote ◴[] No.45073134[source]
Drawing the line at "OK-ish for American English" is far too restrictive.

You can't write CO₂ or m², use a fraction like ½, claim © or mention a price in Euros or Pounds Sterling.

You can't even write major American place names (San José, Oʻahu).

replies(2): >>45077627 #>>45077657 #
1. globular-toast ◴[] No.45077657{3}[source]
It's not too restrictive for me. I rarely need to write foreign place names or words (I'm British). Yeah I use the £ symbol so I'm not limiting myself to ASCII, just what is on my keyboard (I have € too). I just don't really consider a file full of characters I can't type to be "plain text" just because it's UTF-8, that's all.