←back to thread

253 points akyuu | 4 comments | | HN request time: 0.001s | source
Show context
bo1024 ◴[] No.45946196[source]
I wonder if a proof of work protocol is a viable solution. To GET the page, you have to spend enough electricity to solve a puzzle. The question is whether the threshold could be low enough for typical people on their phones to access the site easily, but high enough that mass scraping is significantly reduced.
replies(3): >>45946275 #>>45946380 #>>45946409 #
1. kalavan ◴[] No.45946275[source]
There's this paper from 2004: "Proof-of-Work Proves Not to Work": https://www.cl.cam.ac.uk/~rnc1/proofwork.pdf

The conclusion back then was that it's impossible to make a threshold that is both low enough and high enough.

You need some other mechanism that can distinguish bad traffic from good (even if imperfectly), and then adjust the threshold based on it. See, for instance, "Proof of Work can Work": https://sites.cs.ucsb.edu/~rich/class/cs293b-cloud/papers/lu...

replies(2): >>45946369 #>>45946938 #
2. bo1024 ◴[] No.45946369[source]
Thanks for these references! I imagine the numbers would be entirely different in our context (20 years later and web serving, not email sending). And the idea of spammers using bot nets (therefore not paying for computer themselves) would be less relevant to LLM scraping. But I’ll try to check for forward references on these.
replies(1): >>45947078 #
3. beeflet ◴[] No.45946938[source]
Good links, but this is just for email and relies on some (admittedly) pretty lofty assumptions
4. kalavan ◴[] No.45947078[source]
> And the idea of spammers using bot nets (therefore not paying for computer themselves) would be less relevant to LLM scraping.

It's possible that the services that reward users for running proxies (or are bundled with mobile apps with a notice buried in the license) would also start rewarding/hiding compute services as well. There's currently no money in it because proof-of-work is so rare, but if it changes, their strategy might too.