(brainbaking.com)

253 points akyuu | 4 comments | 16 Nov 25 13:12 UTC | HN request time: 0.001s | source

Show context

bo1024 ◴[16 Nov 25 16:18 UTC] No.45946196[source]▶

I wonder if a proof of work protocol is a viable solution. To GET the page, you have to spend enough electricity to solve a puzzle. The question is whether the threshold could be low enough for typical people on their phones to access the site easily, but high enough that mass scraping is significantly reduced.

replies(3): >>45946275 #>>45946380 #>>45946409 #

1. kalavan ◴[16 Nov 25 16:26 UTC] No.45946275[source]▶

>>45946196 #

There's this paper from 2004: "Proof-of-Work Proves Not to Work": https://www.cl.cam.ac.uk/~rnc1/proofwork.pdf

The conclusion back then was that it's impossible to make a threshold that is both low enough and high enough.

You need some other mechanism that can distinguish bad traffic from good (even if imperfectly), and then adjust the threshold based on it. See, for instance, "Proof of Work can Work": https://sites.cs.ucsb.edu/~rich/class/cs293b-cloud/papers/lu...

replies(2): >>45946369 #>>45946938 #

2. bo1024 ◴[16 Nov 25 16:37 UTC] No.45946369[source]▶

>>45946275 (TP) #

Thanks for these references! I imagine the numbers would be entirely different in our context (20 years later and web serving, not email sending). And the idea of spammers using bot nets (therefore not paying for computer themselves) would be less relevant to LLM scraping. But I’ll try to check for forward references on these.

replies(1): >>45947078 #

3. beeflet ◴[16 Nov 25 17:50 UTC] No.45946938[source]▶

>>45946275 (TP) #

Good links, but this is just for email and relies on some (admittedly) pretty lofty assumptions

4. kalavan ◴[16 Nov 25 18:08 UTC] No.45947078[source]▶

>>45946369 #

> And the idea of spammers using bot nets (therefore not paying for computer themselves) would be less relevant to LLM scraping.

It's possible that the services that reward users for running proxies (or are bundled with mobile apps with a notice buried in the license) would also start rewarding/hiding compute services as well. There's currently no money in it because proof-of-work is so rare, but if it changes, their strategy might too.

↑

The internet is no longer a safe haven