←back to thread

371 points timqian | 9 comments | | HN request time: 0.66s | source | bottom
1. lucb1e ◴[] No.17470583[source]
A data dump would have been much nicer, I don't want to have to install a package to basically run grep on a dataset. The tool should only be for updating the dataset.
replies(2): >>17470929 #>>17471117 #
2. parhamn ◴[] No.17470929[source]
This attitude I see all the time on HN absolutely irks me (e.g. the whole 'A graph of programming languages connected through compilers' thread). Too often people post something fun and cool they built and readers complain about some random feature they think would make it better. It's not like OPs goal was to solve every little problem with graphing terms in HN Job threads. There should really be a rule against this sort of critique.

Its one thing to make a feature suggestion to be helpful to the author and a completely different thing to provide armchair critique like this.

The tool they wrote is to do the job they wanted to do. If you want something else, build it. If it doesn't satisfy your needs or you're too lazy to fork it, move on.

With that said, the data (text and HTML) is literally in the assets dir of the repository which should make it easier for you to build your version of the tool. Did you bother looking?

replies(2): >>17471038 #>>17471116 #
3. jcims ◴[] No.17471038[source]
It feels shameful when you see it directed at others, but then if you post your own stuff you don't want the feeling that the community is pulling punches just to play nice.

Fortunately most of these things are open source, so there's always the 'well submit a PR' response. It's been so long since I slung HTML that I am fairly certain if I attempted it for a Show HN, folks would be so distracted by how bad it is they wouldn't pay attention to the utility of it.

replies(1): >>17471836 #
4. lucb1e ◴[] No.17471116[source]
I did actually bother to look in src, but for some reason didn't check out assets. Now that you mention it, it seems quite obvious, not sure how I missed it.
5. kozikow ◴[] No.17471117[source]
HN comments are in bigquery. It's a simple query to extract that data, I just wrote it: https://bigquery.cloud.google.com/savedquery/1008801496706:d... . It seems to be outdated tho...
replies(2): >>17471592 #>>17473193 #
6. cue232s ◴[] No.17471592[source]
what's outdated? The dataset?
replies(1): >>17474822 #
7. ◴[] No.17471836{3}[source]
8. minimaxir ◴[] No.17473193[source]
The `full` table has comments up to recently (May 19th). You just have to add a filter on `type = "comment"`.

https://bigquery.cloud.google.com/table/bigquery-public-data...

9. kozikow ◴[] No.17474822{3}[source]
Never mind - see other comment by minimaxir. You have to use table bigquery-public-data:hacker_news.full rather than comments for up-to-date results.