Show HN: Hacker News em dash user leaderboard pre-ChatGPT

Slightly tweaked, a leaderboard of em dash containing comments after ChatGPT release, limited to users who used them in fewer than 1% of comments before ChatGPT release, and who posted at least 200 comments before and after ChatGPT release. Data is recent (August 28th).

Of course this doesn't mean they're using ChatGPT either, they could've switched devices or started using them because they felt like it.

  #   user           before_chatgpt after_chatgpt  
  1   fao_           9/1777 (1 %)   36/225 (16 %)
  2   tlogan         1/962 (0 %)    59/399 (15 %)
  3   whynotminot    1/250 (0 %)    36/356 (10 %)
  4   unclebucknasty 13/2566 (1 %)  38/378 (10 %)
  5   iLemming       0/793 (0 %)    61/628 (10 %)
  6   nostrebored    10/1045 (1 %)  32/331 (10 %)
  7   freeone3000    0/2128 (0 %)   74/791 (9 %) 
  8   pdabbadabba    6/932 (1 %)    20/225 (9 %) 
  9   thebooktocome  4/632 (1 %)    18/208 (9 %) 
  10  tnecniv        0/671 (0 %)    34/446 (8 %) 
  11  dkersten       39/5092 (1 %)  24/318 (8 %) 
  12  stared         8/1565 (1 %)   29/392 (7 %) 
  13  ETH_start      3/385 (1 %)    75/1029 (7 %)
  14  tcbawo         2/792 (0 %)    15/218 (7 %) 
  15  jbm            2/406 (0 %)    22/350 (6 %)

Query [2]:

  WITH by_user AS (
    SELECT
      `by` AS user,
      COUNTIF(text LIKE '%—%') AS match_count,
      COUNT(*) AS total_count,
      (timestamp >= '2022-11-30') AS after_chatgpt
    FROM `bigquery-public-data.hacker_news.full` 
    WHERE type = 'comment'
    GROUP BY user, after_chatgpt
  ),
  combined AS (
    SELECT
      user,
      MAX(IF(NOT after_chatgpt, match_count, 0)) AS match_before_chatgpt,
      MAX(IF(NOT after_chatgpt, total_count, 0)) AS total_before_chatgpt,
      MAX(IF(after_chatgpt, match_count, 0)) AS match_after_chatgpt,
      MAX(IF(after_chatgpt, total_count, 0)) AS total_after_chatgpt,
    FROM by_user
    GROUP BY user
    HAVING total_before_chatgpt >= 200 AND total_after_chatgpt >= 200
  ),
  with_fractions AS (
    SELECT
      *,
      SAFE_DIVIDE(match_before_chatgpt, total_before_chatgpt)  AS fraction_before_chatgpt,
      SAFE_DIVIDE(match_after_chatgpt, total_after_chatgpt) AS fraction_after_chatgpt
    FROM combined
  )
  SELECT
    user,
    FORMAT('%d/%d (%.0f %%)', match_before_chatgpt, total_before_chatgpt, ROUND(fraction_before_chatgpt*100)) AS before_chatgpt,
    FORMAT('%d/%d (%.0f %%)', match_after_chatgpt, total_after_chatgpt, ROUND(fraction_after_chatgpt*100)) AS after_chatgpt
  FROM with_fractions
  WHERE fraction_before_chatgpt < 0.01
  ORDER BY fraction_after_chatgpt DESC
  LIMIT 15

[1] https://news.ycombinator.com/item?id=45072937

[2] https://console.cloud.google.com/marketplace/product/y-combi...