Most active commenters
  • kreeben(3)

←back to thread

707 points namukang | 12 comments | | HN request time: 0.878s | source | bottom
1. moritonal ◴[] No.29257791[source]
Whilst nice, how is this going to handle the changing nature of the web? It's nice that it detects "lists" and such, but a few changes to CSS is going to trash that automation right?

I'm also fairly sure you'll break (either directly, or on a user's behalf) a few EULA's that really specifically ban scraping.

replies(2): >>29258424 #>>29260327 #
2. kreeben ◴[] No.29258424[source]
Didn't this case [0] set a precedence that "scraping is not against the law" irregardless of EULA?

[0] https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn

replies(4): >>29258469 #>>29258707 #>>29259962 #>>29260430 #
3. wccrawford ◴[] No.29258469[source]
"using data that is publicly available"

If the user is logged in, that data may not be publicly available, and the EULA would still apply.

4. moritonal ◴[] No.29258707[source]
So it was proven that it's not a criminal offence to scrap a website, but a website is still well within it's rights to ban you from doing so.
replies(1): >>29261122 #
5. detuur ◴[] No.29259962[source]
This might be true in the USA, but the EU has a thing called database rights[0]. Essentially, any collection of data can under certain circumstances be protected under database rights, which prevents other parties from copying (parts of) it. This originally was created to protect such things as phone books and other directories, but when I was a student (I don't remember the context anymore), they specifically warned us that scraping certain websites would violate their database rights, and thus be illegal. So using scrapers in the EU is something you should be very careful with, especially if your business depends on it.

[0] https://en.wikipedia.org/wiki/Database_right

6. baby ◴[] No.29260327[source]
> but a few changes to CSS is going to trash that automation right

hence why it's nice to have that extension to click through the UI rather than figure out how to parse things no?

7. catskul2 ◴[] No.29260430[source]
Pedantry: regardless
replies(1): >>29260517 #
8. kreeben ◴[] No.29260517{3}[source]
You pedantic piece of... nah I'm just kidding, Thank you. I actually learned English by watching Clint Eastwood, Charles Bronson and Sylvester Stallone movies, so my grammar might be slightly off from time to time, but google actually agrees with me when I say: irregardless == regardless.
replies(1): >>29260651 #
9. Jugurtha ◴[] No.29260651{4}[source]
https://en.wikipedia.org/wiki/Irregardless

https://www.merriam-webster.com/dictionary/irregardless

I dislike that word more than I dislike "nucular". Like diarrhea, anyone can let it slip.

replies(1): >>29260684 #
10. kreeben ◴[] No.29260684{5}[source]
Ah, so people've been making this mistake for over two hundred years but thanks to people like you, this misuse of language has been all but eradicated?

Radicated?

;)

replies(1): >>29260935 #
11. Jugurtha ◴[] No.29260935{6}[source]
That is quite the pun!

I tend to remind people who think that this is an error that, although I share their disliking...

- There is a case for the word and it predates us

- Languages are dynamic and today's "correct" spelling is yesterday's "erroneous" spelling.

I thought until recently that the spelling was "simply incorrect" until I found out there was more to it. It therefore is a reminder to myself as well.

12. ◴[] No.29261122{3}[source]