How to Use Em Dashes (–), En Dashes (–), and Hyphens (-)

Here's an easy, if not always precise way to remember:

* Hyphens connect things, such as compound words: double-decker, cut-and-dried, 212-555-5555.

* EN dashes make a range between things: Boston–San Francisco flight, 10–20 years: both connect not only the endpoints, but define that all the space between is included. (Compare the last usage with the phone number example under Hyphens.)

* EM dashes break things, such as sentences or thoughts: 'What the—!'; A paragraph should express one idea—but rules are made to be broken.

Unicode has the original ASCII hyphen-minus (U+002d), as well as a dedicated hyphen (U+2010), other functional hyphens such as soft and non-breaking hyphens, and a dedicated minus sign (U+2212), and some variations of minus such as subscript, superscript, etc.

There's also the figure dash "‒" (U+2012), essentally a hyphen-minus that's the same width as numbers and used aesthetically for typsetting, afaik. And don't overlook two-em-dashes "⸺" and three-em-dashes "⸻" and horizontal bars "―", the latter used like quotation marks!

"There's also the figure dash…"

Re last paragraph: dashes, etc. are confusing for perhaps most of us who aren't, say, typesetters, myself included. I use EM dashes a lot usually without a space between words and sometimes with spaces when I think the typography calls for it—or for extra emphasis.

Essentially, most of us guess the rules and often this doesn't matter much but it can in certain circumstances.

For example, in say machine conversion/transliteration. The ASCII dash is often used as a substitute for Unicode minus sign because it's easy to select [it's my usual practice], and anyway many don't know there is an actual difference. Whilst a human will usually know the difference by its use or context a machine may take the literal interpretation which could lead to say a numerical calculation error.

This problem has annoyed me for a long while. Why is it that wordprocessors and editors do not highlight these characters and query whether the usage is correct? Surely this ought not to be that difficult.

Another example is Roman numerals. The average person will enter say an uppercase 'I' for the Roman numeral one. Here's a typical example which is incorrect:

WWII

Here I entered the normal ASCII 'I' because it was too involved to find the correct Unicode character for Roman numeral one.

I'd like to know what others who are in typography, machine learning etc. think about this, and why WP programs and editors don't have simple ergonomics that allow for easy selection of the correct character.

† On a related matter, you'll note I've used single quotes whereas mmooss uses double quotes. This tell me that mmooss is likely in the US whereas I'm not. Again, this is not really a major problem for humans but it can be in transliteration, etc. Also, it's unclear (at least to me) what the default is for quoting quotes, i.e.: "" versus "' (right, I've refrained from using triple quotes).

Again, this seems country specific with I believe the US favoring double followed by single. Even when these rules are defined do people strictly adhere to them?