Try it here (you may have to create a Google Cloud project, but you don't have to enable billing or start the free trial):
https://console.cloud.google.com/bigquery?p=bigquery-public-...
Click on the `+` (white over blue background) in the tab bar at the top that says "SQL query" on popup, and type the following (I use the GoogleSQL pipe syntax (https://cloud.google.com/bigquery/docs/reference/standard-sq... / https://news.ycombinator.com/item?id=41347188) below, but you can also use standard SQL if you prefer):
FROM `bigquery-public-data.hacker_news.full`
|> WHERE type = 'comment' AND timestamp < '2022-11-30'
|> AGGREGATE COUNT(*) AS total, COUNTIF(text LIKE '%—%') AS with_em GROUP BY `by`
|> EXTEND with_em / total AS fraction_with_em
|> ORDER BY fraction_with_em DESC
|> WHERE total > 100 AND fraction_with_em > 0.1
(I'm in place 47 of the 516 results, with 0.29 of my comments (258 of 875) having an em dash in them.)
Edit: As you also asked about timestamps:
FROM `bigquery-public-data.hacker_news.full`
|> WHERE type = 'comment' AND timestamp < '2022-11-30'
|> EXTEND text LIKE '%—%' AS has_em
|> AGGREGATE
COUNT(*) AS total,
COUNTIF(has_em) AS with_em,
MIN(timestamp) AS first_comment_timestamp,
MIN(IF(has_em, timestamp, NULL)) AS first_em_timestamp,
TIMESTAMP_SECONDS(CAST(AVG(time) AS INT64)) AS avg_comment_timestamp,
TIMESTAMP_SECONDS(CAST(AVG(IF(has_em, time, NULL)) AS INT64)) AS avg_em_timestamp,
GROUP BY `by`
|> EXTEND with_em / total AS fraction_with_em
|> ORDER BY fraction_with_em DESC
|> WHERE total > 100 AND fraction_with_em > 0.1
for most people the average timestamp is just the midpoint of when they started posting (with em dashes) and the cutoff date of 2022-11-30, and the top-place user zmgsabst stands out for having started only in late January 2022.