←back to thread

371 points timqian | 4 comments | | HN request time: 0.001s | source
Show context
lucb1e ◴[] No.17470583[source]
A data dump would have been much nicer, I don't want to have to install a package to basically run grep on a dataset. The tool should only be for updating the dataset.
replies(2): >>17470929 #>>17471117 #
1. kozikow ◴[] No.17471117[source]
HN comments are in bigquery. It's a simple query to extract that data, I just wrote it: https://bigquery.cloud.google.com/savedquery/1008801496706:d... . It seems to be outdated tho...
replies(2): >>17471592 #>>17473193 #
2. cue232s ◴[] No.17471592[source]
what's outdated? The dataset?
replies(1): >>17474822 #
3. minimaxir ◴[] No.17473193[source]
The `full` table has comments up to recently (May 19th). You just have to add a filter on `type = "comment"`.

https://bigquery.cloud.google.com/table/bigquery-public-data...

4. kozikow ◴[] No.17474822[source]
Never mind - see other comment by minimaxir. You have to use table bigquery-public-data:hacker_news.full rather than comments for up-to-date results.