←back to thread

253 points akyuu | 10 comments | | HN request time: 0s | source | bottom
1. bo1024 ◴[] No.45946196[source]
I wonder if a proof of work protocol is a viable solution. To GET the page, you have to spend enough electricity to solve a puzzle. The question is whether the threshold could be low enough for typical people on their phones to access the site easily, but high enough that mass scraping is significantly reduced.
replies(3): >>45946275 #>>45946380 #>>45946409 #
2. kalavan ◴[] No.45946275[source]
There's this paper from 2004: "Proof-of-Work Proves Not to Work": https://www.cl.cam.ac.uk/~rnc1/proofwork.pdf

The conclusion back then was that it's impossible to make a threshold that is both low enough and high enough.

You need some other mechanism that can distinguish bad traffic from good (even if imperfectly), and then adjust the threshold based on it. See, for instance, "Proof of Work can Work": https://sites.cs.ucsb.edu/~rich/class/cs293b-cloud/papers/lu...

replies(2): >>45946369 #>>45946938 #
3. bo1024 ◴[] No.45946369[source]
Thanks for these references! I imagine the numbers would be entirely different in our context (20 years later and web serving, not email sending). And the idea of spammers using bot nets (therefore not paying for computer themselves) would be less relevant to LLM scraping. But I’ll try to check for forward references on these.
replies(1): >>45947078 #
4. tonyhart7 ◴[] No.45946380[source]
I'am sorry but it literally didnt works as other commenter cited

because if you make the requirement like that, its basically cancel out the other effect

5. IshKebab ◴[] No.45946409[source]
I feel like it could work. If you think about it, you need the cost to the client to be greater than the cost to the server. As long as that is true the server shouldn't mind about increased traffic because it's making a profit!

Very crudely if you think that a request costs the server ~10ms of compute time and a phone is 30x slower then you'd need 300ms of client compute time to equal it which seems very reasonable.

The only problem is you would need a cryptocurrency that a) lets you verify tiny chunks of work, and b) can't be done faster than you can do it on a phone using other hardware, and c) lets a client mine money without being to actually spend it ("homomorphic mining"?).

I don't know if anything like that exists but it would be an interesting problem to solve.

replies(2): >>45947119 #>>45949582 #
6. beeflet ◴[] No.45946938[source]
Good links, but this is just for email and relies on some (admittedly) pretty lofty assumptions
7. kalavan ◴[] No.45947078{3}[source]
> And the idea of spammers using bot nets (therefore not paying for computer themselves) would be less relevant to LLM scraping.

It's possible that the services that reward users for running proxies (or are bundled with mobile apps with a notice buried in the license) would also start rewarding/hiding compute services as well. There's currently no money in it because proof-of-work is so rare, but if it changes, their strategy might too.

8. beeflet ◴[] No.45947119[source]
The problem is that the attacker isn't using a phone, they are using some type of specialized hardware.

I still think it is possible with some customized variant of RandomX. The server could even make a bit of money by acting as a mining pool by forcing the clients to mine a certain block template. It's just that it would need to be installed as a browser plugin or something, it wouldn't be efficient running within a page.

Also the verification process for RandomX is still pretty intensive. so there is a high minimum bar for where it would be feasible.

replies(1): >>45951502 #
9. bo1024 ◴[] No.45949582[source]
I wasn't picturing collecting payments from visitors (presumably with cryptocurrency), but instead just asking visitors to burn energy. It would be socially better to collect payments, but we can't assume every visitor is set up with a cryptocurrency wallet.
10. IshKebab ◴[] No.45951502{3}[source]
> The problem is that the attacker isn't using a phone, they are using some type of specialized hardware.

Yeah I covered that in b) in my comment.