Bot or human? Creating an invisible Turing test for the internet

(research.roundtable.ai)

141 points timshell | 4 comments | 25 Jun 25 15:00 UTC | HN request time: 0.581s | source

Show context

imiric ◴[25 Jun 25 15:30 UTC] No.44378450[source]▶

I applaud the effort. We need human-friendly CAPTCHAs, as much as they're generally disliked. They're the only solution to the growing spam and abuse problem on the web.

Proof-of-work CAPTCHAs work well for making bots expensive to run at scale, but they still rely on accurate bot detection. Avoiding both false positives and negatives is crucial, yet all existing approaches are not reliable enough.

One comment re:

> While AI agents can theoretically simulate these patterns, the effort likely outweighs other alternatives.

For now. Behavioral and cognitive signals seem to work against the current generation of bots, but will likely also be defeated as AI tools become cheaper and more accessible. It's only a matter of time until attackers can train a model on real human input, and inference to be cheap enough. Or just for the benefit of using a bot on a specific target to outweigh the costs.

So I think we will need a different detection mechanism. Maybe something from the real world, some type of ID, or even micropayments. I'm not sure, but it's clear that bot detection is at the opposite, and currently losing, side of the AI race.

replies(11): >>44378709 #>>44379146 #>>44379545 #>>44380175 #>>44380453 #>>44380659 #>>44380693 #>>44382515 #>>44384051 #>>44387254 #>>44389004 #

1. chrismorgan ◴[25 Jun 25 17:03 UTC] No.44379545[source]▶

>>44378450 #

> We need human-friendly CAPTCHAs, as much as they're generally disliked. They're the only solution to the growing spam and abuse problem on the web.

This is wrong, badly wrong.

CAPTCHA stood for “Completely Automated Public Turing test to tell Computers and Humans Apart”. And that’s how people are using such things: to tell computers and humans apart. But that’s not the right problem.

Spam and abuse can come from computers, or from humans.

Productive use can come from humans, or from computers.

Abuse prevention should not be about distinguishing computers and humans: it should be about the actual usage behaviour.

CAPTCHAs are fundamentally solving the wrong problem. Twenty years ago, they were a tolerable proxy for the right problem: imperfect, but generally good enough. But they have become a worse proxy over time.

Also, “human-friendly CAPTCHAs” are just flat-out impossible in the long term. As you identify, it’s only a “for now” thing. Once it’s a target, it ceases to be effective. And the range in humans is so broad that it’s generally distressingly easy to make a bot exceed the lower reaches of human performance.

> Proof-of-work CAPTCHAs work well for making bots expensive to run at scale, but they still rely on accurate bot detection. Avoiding both false positives and negatives is crucial, yet all existing approaches are not reliable enough.

Proof-of-work is even more obviously a temporary solution, security by obscurity: it relies upon symmetry in computation power, which is just wildly incorrect. And all of the implementations I know of have made the bone-headed decision to start with SHA-256 hashing, which amplifies this asymmetry to ludicrous degree (factors of tens of thousands with common hardware, to tens of millions with Bitcoin mining hardware). At that point, forget choosing different iteration counts based on bot detection, it doesn’t even matter.

—⁂—

The inconvenient truth is: there is no Final Ultimate Solution to the Spam Problem (FUSSP).

replies(2): >>44379950 #>>44382001 #

2. imiric ◴[25 Jun 25 17:44 UTC] No.44379950[source]▶

>>44379545 (TP) #

> Spam and abuse can come from computers, or from humans.

> Productive use can come from humans, or from computers.

I agree in principle, but the reality is that 37% of all internet traffic originates from bots[1]. The overwhelming majority of that traffic (89% according to Fastly) can be described as abusive. In turn, the abusive traffic from humans likely pales in comparison. It's vastly cheaper to setup bot farms than mechanical turk farms, and it's only getting cheaper.

Identifying the source of the traffic, while difficult, is a generalizable problem. Whereas tracking specific behavior will depend on each site, and will likely require custom implementation for each type of service. Or it requires invasive tracking of users throughout the duration of their session, as many fraud prevention systems do.

Both approaches can be deployed at the same time. A CAPTCHA is not meant to be the only security solution anyway, but as a first layer of defense that is generally simple to deploy and maintain.

That said, I concede that the sentence "[CAPTCHAs] are the only solution" is wrong. :)

> Proof-of-work is even more obviously a temporary solution, security by obscurity

I disagree, and don't see how it's security by obscurity. It's simply a method of increasing the access cost for abusive traffic. The more signals are gathered that identify the user as abusive, the higher the "price" they're required to pay to access the service. Whether the user is a suspected bot or not could just be one type of signal. Behavioral and cognitive signals as mentioned in TFA can be others. Yes, these methods aren't perfect, and can mistakenly penalize human users and be spoofed by bots, but it's the best we currently have. This is what I'd like to see improved.

Still, even with all their faults, I think PoW CAPTCHAs offer a much better UX than traditional CAPTCHAs ever did. Yes, telling humans apart from computers is getting more difficult, but it doesn't mean that the task is pointless.

[1]: https://learn.fastly.com/rs/025-XKO-469/images/Fastly-Threat...

3. Dylan16807 ◴[25 Jun 25 21:31 UTC] No.44382001[source]▶

>>44379545 (TP) #

> Proof-of-work is even more obviously a temporary solution, security by obscurity: it relies upon symmetry in computation power, which is just wildly incorrect. And all of the implementations I know of have made the bone-headed decision to start with SHA-256 hashing, which amplifies this asymmetry to ludicrous degree (factors of tens of thousands with common hardware, to tens of millions with Bitcoin mining hardware). At that point, forget choosing different iteration counts based on bot detection, it doesn’t even matter.

It takes a long time and enormous amounts of money to make new chips for a specific proof of work. And sites can change their algorithm on a dime. I don't think this is a big issue.

replies(1): >>44383743 #

4. chrismorgan ◴[26 Jun 25 02:33 UTC] No.44383743[source]▶

>>44382001 #

Even disregarding the SHA-256 thing, there is unavoidable significant asymmetry and range that renders proof of work unviable. One legitimate user may use a low-end phone, another may have a high-end desktop that can work a hundred or more times as fast whatever technique you use, and an attacker may have a bot net.

It’s important to assume, in security and security-adjacent things, that the attacker has more compute power than the defender. You cannot win in this way.

Proof-of-work is bad rate limiting that relies upon the server having a good estimate of the capabilities of the client. No more, no less.

I bring up the SHA-256 thing as an argument that none of the players in the space are competent. None of them. If you exclude hand-rolled cryptography or known-bad techniques like MD5, SHA-256 is very literally the worst choice remaining: its use in Bitcoin and the rewards available have utterly broken it for this application. If you intend proof of work to actually be the line of defence, you start with something like Argon2d instead. I honestly think that, at this stage, these scripts could replace their proof of work with a “sleep for one second” (maybe adding “or two if I think you’re probably a bot”) routine and have the server trust that they had done so, without compromising their effectiveness.

↑