←back to thread

650 points Stratoscope | 1 comments | | HN request time: 0.195s | source
Show context
A_D_E_P_T ◴[] No.43497927[source]
AFAIK most computer keyboards don't have em dashes. Rather than hit ALT+0151 every time, I've always just strung along two hyphens, like: --

Absolutely proper and correct use of em dashes, en dashes, and hyphens is, to me, the most obvious tell of the LLM writer. In fact, I think that you can use it to date internet writing in general. For it seems to me that real em dashes were uncommon pre-2022.

replies(45): >>43497951 #>>43497959 #>>43497991 #>>43497995 #>>43498049 #>>43498054 #>>43498058 #>>43498059 #>>43498089 #>>43498153 #>>43498177 #>>43498218 #>>43498231 #>>43498389 #>>43498395 #>>43498405 #>>43498437 #>>43498472 #>>43498488 #>>43498490 #>>43498537 #>>43498605 #>>43498634 #>>43498758 #>>43498838 #>>43498867 #>>43499020 #>>43499053 #>>43499108 #>>43499180 #>>43499193 #>>43499275 #>>43499285 #>>43499331 #>>43499336 #>>43500092 #>>43500616 #>>43501429 #>>43501430 #>>43501576 #>>43503609 #>>43505370 #>>43505896 #>>43506418 #>>43506419 #
1. Quailman84 ◴[] No.43506419[source]
For a while, em dashes were really popular among LLM enthusiasts because of the idea that it would encourage the LLM to draw from training data that contained em dashes—which typically were higher quality training data written by a professional writer or somebody with a professional editor. Subjectively, I think it worked. I suspect that the LLMs trained to be used as chatbots were finetuned to use the em dash liberally for that reason. Now, after a few generations of these models, I think that the em dash is starting to have the effect of drawing from "slop" training data that was written by other LLMs rather than well-written human data.