Most active commenters

timshell(11)
lucb1e(9)
JimDabell(8)
imiric(6)
Dylan16807(4)
hinkley(4)
(4)
codedokode(4)
msgodel(3)
busymom0(3)

Popular/hot comments

>>44378450 #
>>44378709 #
>>44381949 #
>>44378354 #
>>44379782 #
>>44379234 #

Bot or human? Creating an invisible Turing test for the internet

(research.roundtable.ai)

1. BobbyTables2 ◴[25 Jun 25 15:06 UTC] No.44378200[source]▶

>>44378127 (OP) #

Ironic that we are so intent on creating bots that ask and check questions unsolvable by other bots.

2. JimDabell ◴[25 Jun 25 15:18 UTC] No.44378326[source]▶

>>44378127 (OP) #

This is interesting stuff, but I’d be seriously concerned about this accidentally catching people who have accessibility needs. How is it going to handle somebody using the keyboard to tab through controls instead of the mouse? Is a typing cadence detector going to flag people who use voice interfaces?

replies(1): >>44386740 #

3. qoez ◴[25 Jun 25 15:22 UTC] No.44378354[source]▶

>>44378127 (OP) #

I totally assumed typing cadence and mouse behaviour was incorperated into bot detection for years before this already, interesting.

replies(5): >>44378493 #>>44378607 #>>44378715 #>>44378901 #>>44379408 #

4. chromatin ◴[25 Jun 25 15:23 UTC] No.44378373[source]▶

>>44378127 (OP) #

"When a measure becomes a target, it ceases to be a good measure".

https://en.wikipedia.org/wiki/Goodhart%27s_law

BRB, changing the simulated latency in my bot.

replies(2): >>44378919 #>>44379740 #

5. imiric ◴[25 Jun 25 15:30 UTC] No.44378450[source]▶

>>44378127 (OP) #

I applaud the effort. We need human-friendly CAPTCHAs, as much as they're generally disliked. They're the only solution to the growing spam and abuse problem on the web.

Proof-of-work CAPTCHAs work well for making bots expensive to run at scale, but they still rely on accurate bot detection. Avoiding both false positives and negatives is crucial, yet all existing approaches are not reliable enough.

One comment re:

> While AI agents can theoretically simulate these patterns, the effort likely outweighs other alternatives.

For now. Behavioral and cognitive signals seem to work against the current generation of bots, but will likely also be defeated as AI tools become cheaper and more accessible. It's only a matter of time until attackers can train a model on real human input, and inference to be cheap enough. Or just for the benefit of using a bot on a specific target to outweigh the costs.

So I think we will need a different detection mechanism. Maybe something from the real world, some type of ID, or even micropayments. I'm not sure, but it's clear that bot detection is at the opposite, and currently losing, side of the AI race.

replies(11): >>44378709 #>>44379146 #>>44379545 #>>44380175 #>>44380453 #>>44380659 #>>44380693 #>>44382515 #>>44384051 #>>44387254 #>>44389004 #

6. NoMoreNicksLeft ◴[25 Jun 25 15:32 UTC] No.44378475[source]▶

>>44378127 (OP) #

The problem has never been that some bots could eventually seem like they were human. The problem is and will continue to be that many humans (millions upon millions) look like bots.

Have you never once looked at the captcha and couldn't decide whether the 3 pixels of the motorcycle sticking out into the grid square meant that you should select that grid square too? Not once? As the tests become ever more sophisticated, more and more of you all will be locked out.

replies(2): >>44378778 #>>44381805 #

7. NoMoreNicksLeft ◴[25 Jun 25 15:34 UTC] No.44378493[source]▶

>>44378354 #

You can never go wrong betting on laziness and aversion to ambition for excellence.

8. bgwalter ◴[25 Jun 25 15:45 UTC] No.44378607[source]▶

>>44378354 #

chess.com had this a long time ago.

9. JimDabell ◴[25 Jun 25 15:54 UTC] No.44378709[source]▶

>>44378450 #

> So I think we will need a different detection mechanism. Maybe something from the real world, some type of ID, or even micropayments. I'm not sure, but it's clear that bot detection is at the opposite, and currently losing, side of the AI race.

I think the most likely long-term solution is something like DIDs.

https://en.wikipedia.org/wiki/Decentralized_identifier

A small number of trusted authorities (e.g. governments) issue IDs. Users can identify themselves to third-parties without disclosing their real-world identity to the third-party and without disclosing their interaction with the third-party to the issuing body.

The key part of this is that the identity is persistent. A website might not know who you are, but they know when it’s you returning. So if you get banned, you can’t just register a new account to evade the ban. You’d need to do the equivalent of getting a new passport from your government.

replies(7): >>44378752 #>>44379158 #>>44379293 #>>44379764 #>>44381669 #>>44382394 #>>44387968 #

10. lq9AJ8yrfs ◴[25 Jun 25 15:54 UTC] No.44378715[source]▶

>>44378354 #

You are not wrong.

The article is more of an intro piece for newcomers and doesn't discuss at all the state of the art or where the competition is--the high end of the market is pretty saturated already but the low end is wide open.

There is a bit of a spread in the market, and the specific detection techniques are ofc proprietary and dynamic. Until you have stewed on it quite a bit, it is reasonable to assume everything you can think of has a- been tried b- is either mainstream or doesn't work well c- what working well means is subtle.

Bots are adversarial and nasty ones play the field. Sources of truth are scarce and expensive to consult, and the costs of false positives are felt acutely by the users and the buyers, vs false negatives are more of a slow burn and a nagging suspicion.

replies(1): >>44379802 #

11. freeone3000 ◴[25 Jun 25 15:58 UTC] No.44378752{3}[source]▶

>>44378709 #

It also allows automated software to act on behalf of a person, which is excellent for assistive technologies and something most current bot detection leaves behind.

replies(1): >>44382747 #

12. baby_souffle ◴[25 Jun 25 16:01 UTC] No.44378778[source]▶

>>44378475 #

Or you'll get the "click all squares with a stop light" prompt and it's a closeup of a signal light so you just click everything... But if you get it correct _and_ too quick l, you're a bot!

13. timshell ◴[25 Jun 25 16:10 UTC] No.44378901[source]▶

>>44378354 #

That's definitely been the marketing. The point of Section 1 is to refute that point

replies(1): >>44380851 #

14. timshell ◴[25 Jun 25 16:11 UTC] No.44378919[source]▶

>>44378373 #

Agreed. Section 3 takes the idea to the extreme -- can a bot replicate human cognition? Traditional OCR CAPTCHAs were a good 'measure' that couldn't be fully gamed. That is, while the rise of computer vision made them eventually ineffective, the gains in computer vision did not come from bot farms

15. kjok ◴[25 Jun 25 16:23 UTC] No.44379062[source]▶

>>44378127 (OP) #

Solutions relying on JavaScript that runs in user-controlled browsers are vulnerable to attacks and manipulation.

replies(1): >>44380313 #

16. TechDebtDevin ◴[25 Jun 25 16:26 UTC] No.44379099[source]▶

>>44378127 (OP) #

I personally work on this all day everyday, you're never going to find my crawlers, stop trying lmfao.

replies(2): >>44379188 #>>44379751 #

17. turnsout ◴[25 Jun 25 16:32 UTC] No.44379146[source]▶

>>44378450 #

Exactly. If the financial incentive is there, they'll add sufficient jitter to trick the detector, and eventually train an ML model to make it even more realistic.

replies(1): >>44379234 #

18. imiric ◴[25 Jun 25 16:33 UTC] No.44379158{3}[source]▶

>>44378709 #

On the one hand, yes, this might work, but I'm concerned that it will inevitably require loss of anonymity and be abused by companies for user tracking. I suppose any type of user identification or fingerprinting is at the expense of user privacy, but I hope we can come up with solutions that don't have these drawbacks.

replies(2): >>44379211 #>>44379275 #

19. charcircuit ◴[25 Jun 25 16:34 UTC] No.44379177[source]▶

>>44378127 (OP) #

>How much can these behavioral patterns be spoofed? This remains an ongoing question, but the evidence to date is optimistic. Academic studies have found behavioral biometrics to be robust against attacks under adversarial conditions, and industry validation from top financial institutions demonstrates real-world resilience

I have the opposite view. This already played out in the Minecraft community and it turns out ghost clients are effective in spoofing such behavioral signals and avoiding anticheat. Also I doubt you can get any meaningful signal from the couple of a seconds a user's ai agent is scrolling through a site.

20. erekp ◴[25 Jun 25 16:35 UTC] No.44379188[source]▶

>>44379099 #

same. good luck finding us out there - we can replicate all the patterns you point out there. been in this industry for 10 years now :)

replies(1): >>44382338 #

21. joshmarinacci ◴[25 Jun 25 16:36 UTC] No.44379206[source]▶

>>44378127 (OP) #

I feel like we are fighting the wrong battle here. Eventually AI bot behavior online will be indistinguishable from human, but so what?! We've had teams of underpaid humans being paid to be organic bots for years now.

Whether the person interacting with your website is human or not isn't relevant anymore. What matters is what they are doing; be they human, bot, or AI agent.

replies(2): >>44379368 #>>44380793 #

22. charcircuit ◴[25 Jun 25 16:37 UTC] No.44379211{4}[source]▶

>>44379158 #

The benefit of majorly reducing fraud can create an ecosystem where the trade off is worth it for users to take. For example generous free plans or trials can exist without companies needing to invest so much in antifraud for them.

23. timshell ◴[25 Jun 25 16:38 UTC] No.44379234{3}[source]▶

>>44379146 #

Yes and no. Traditional CAPTCHAs didn't cause bot farms to advance computer vision

replies(4): >>44379973 #>>44380078 #>>44380767 #>>44381765 #

24. JimDabell ◴[25 Jun 25 16:42 UTC] No.44379275{4}[source]▶

>>44379158 #

> I'm concerned that it will inevitably require loss of anonymity and be abused by companies for user tracking.

Are you sure you read my comment fully?

replies(2): >>44379384 #>>44379398 #

25. thatnerd ◴[25 Jun 25 16:43 UTC] No.44379293{3}[source]▶

>>44378709 #

https://www.wired.com/story/worldcoin-sam-altman-orb/

replies(2): >>44379310 #>>44379354 #

26. timshell ◴[25 Jun 25 16:44 UTC] No.44379310{4}[source]▶

>>44379293 #

Yup, Worldcoin has been the one of the efforts in this space. We're trying to have a frictionless, less privacy-invasive method than biometric scanning

replies(1): >>44381986 #

27. julkali ◴[25 Jun 25 16:48 UTC] No.44379354{4}[source]▶

>>44379293 #

That is the silicon valley cryptoscam version.

This concept has been studied already extensively, e.g [1] (in 2000!) by people like Rivest and Chaum, who have actual decade-old competence in that field.

[1] https://people.csail.mit.edu/rivest/pubs/pubs/LRSW99.pdf

replies(2): >>44381396 #>>44384295 #

28. butundstand ◴[25 Jun 25 16:48 UTC] No.44379368[source]▶

>>44379206 #

You have to understand the motive to understand why this is a problem; their startups haven’t unicorned yet. They never had a fallback plan so humanity must cling to web app driven economics until they unicorn.

See also Elon demanding ad spend on his platform or like it’s literally just like when the Nazis invaded Poland. Anyone got some E? PLUR, bro but also fewer vacay days for you.

Empty economic activity driven by fiat decree of wealth hoarders suffering from post war and Cold War and leaded gas fume, lead water fueled paranoias and psychosis.

People made insane by memorization of illusory social obligations to history always run the world.

29. Liquix ◴[25 Jun 25 16:50 UTC] No.44379384{5}[source]▶

>>44379275 #

> trusted authorities (e.g. governments)

the governments powerful enough to roll something like this out are not trusted authorities which will protect the privacy of their citizens. remember before the Snowden revelations when the NSA's director of national intelligence swore under oath that they did not collect "any type of data at all on millions of Americans"?

https://en.wikipedia.org/wiki/James_Clapper#Testimony_to_Con...

replies(2): >>44383143 #>>44383468 #

30. imiric ◴[25 Jun 25 16:51 UTC] No.44379398{5}[source]▶

>>44379275 #

I did. It doesn't matter that the website might not be able to directly associate a real-world identity with a digital one. It takes a small number of signals to uniquely fingerprint a user, so it's only a matter of associating the fingerprint with the ID, whether that's a real-world or digital one. It can still be used for tracking. By having a static ID that can only be issued by governments or approved agencies we'd only be making things easier for companies to track users.

replies(2): >>44381930 #>>44383498 #

31. ipdashc ◴[25 Jun 25 16:53 UTC] No.44379408[source]▶

>>44378354 #

Yeah, I feel like I'm going crazy looking at that first example video. Was Google's CAPTCHA not supposed to analyze exactly that? Yet the mouse is insta-jumping to the input boxes, the input text is being pasted in instantaneously, and somehow it gets past? That seems utterly trivial to detect. Meanwhile us normal users are clicking on pictures of traffic lights all day?

replies(2): >>44379425 #>>44379504 #

32. timshell ◴[25 Jun 25 16:54 UTC] No.44379425{3}[source]▶

>>44379408 #

me and you both

33. mitchitized ◴[25 Jun 25 17:00 UTC] No.44379504{3}[source]▶

>>44379408 #

That is because I do not think Google's aims for captcha are the same as ours.

I can tell you that as soon as you download Chrome and login to any Google account of yours, the captcha tests are suddenly and mysteriously gone.

Use firefox in full-lockdown mode, and you will be clicking fire hydrants and crosswalks for the next several hours.

My crazy conspiracy theory is that Google is just using captcha as an opportunity to force everyone out of privacy mode, further empowering the surveillance capitalism engines. The intent is not to be effective, but inconvenient.

replies(1): >>44381135 #

34. bwfan123 ◴[25 Jun 25 17:02 UTC] No.44379527[source]▶

>>44378127 (OP) #

We also need an inverse turing test. ie, detect humans pretending to be AI.

Like the case recently of builder.ai which had humans pretending to be ai.

Turing was a visionary - but even he could not imagine a time when humans pretend to be bots.

replies(2): >>44379575 #>>44379579 #

35. chrismorgan ◴[25 Jun 25 17:03 UTC] No.44379545[source]▶

>>44378450 #

> We need human-friendly CAPTCHAs, as much as they're generally disliked. They're the only solution to the growing spam and abuse problem on the web.

This is wrong, badly wrong.

CAPTCHA stood for “Completely Automated Public Turing test to tell Computers and Humans Apart”. And that’s how people are using such things: to tell computers and humans apart. But that’s not the right problem.

Spam and abuse can come from computers, or from humans.

Productive use can come from humans, or from computers.

Abuse prevention should not be about distinguishing computers and humans: it should be about the actual usage behaviour.

CAPTCHAs are fundamentally solving the wrong problem. Twenty years ago, they were a tolerable proxy for the right problem: imperfect, but generally good enough. But they have become a worse proxy over time.

Also, “human-friendly CAPTCHAs” are just flat-out impossible in the long term. As you identify, it’s only a “for now” thing. Once it’s a target, it ceases to be effective. And the range in humans is so broad that it’s generally distressingly easy to make a bot exceed the lower reaches of human performance.

> Proof-of-work CAPTCHAs work well for making bots expensive to run at scale, but they still rely on accurate bot detection. Avoiding both false positives and negatives is crucial, yet all existing approaches are not reliable enough.

Proof-of-work is even more obviously a temporary solution, security by obscurity: it relies upon symmetry in computation power, which is just wildly incorrect. And all of the implementations I know of have made the bone-headed decision to start with SHA-256 hashing, which amplifies this asymmetry to ludicrous degree (factors of tens of thousands with common hardware, to tens of millions with Bitcoin mining hardware). At that point, forget choosing different iteration counts based on bot detection, it doesn’t even matter.

—⁂—

The inconvenient truth is: there is no Final Ultimate Solution to the Spam Problem (FUSSP).

replies(2): >>44379950 #>>44382001 #

36. jenadine ◴[25 Jun 25 17:05 UTC] No.44379575[source]▶

>>44379527 #

Yet, human pretending to be machine have existed for centuries https://en.m.wikipedia.org/wiki/Mechanical_Turk

37. hobs ◴[25 Jun 25 17:06 UTC] No.44379579[source]▶

>>44379527 #

Not so far fetched, The Mechanical Turk was created in the 1700s, so that already happened a long time before Turing was born.

38. hinkley ◴[25 Jun 25 17:21 UTC] No.44379737[source]▶

>>44378127 (OP) #

I’ve wanted to create a wiki for a hobby for a long time, but I don’t want to get stuck in spam and abuse reports, which just becomes more of a given with each passing year.

With a hobby wiki, eventual consistency is fine. I believe ghost bans and quarantine and some sort of invisible captcha would go a long way toward my goal, but it’s hard to find invisible captcha.

There was a research project long ago that used high resolution data from keyboards to determine who was typing. The idea was not to use the typing pattern as a password, but to flag suspicious activity. To have someone walk past that desk to see if Sally hurt her arm playing tennis this weekend of if Dave is fucking around on her computer while she’s in a meeting

That’s about the level I’m looking for. Assume everyone is a bot during a probationary period and put accounts into buckets of likely human, likely bot, and unknown.

What I’d have to work out though is temporary storage for candidate edits in a way they cannot fill up my database. A way to throttle them and throw some away if they hit a limit. Otherwise it’s still a DOS attack.

replies(2): >>44380894 #>>44381148 #

39. ◴[25 Jun 25 17:21 UTC] No.44379740[source]▶

>>44378373 #

40. ◴[25 Jun 25 17:22 UTC] No.44379751[source]▶

>>44379099 #

41. BiteCode_dev ◴[25 Jun 25 17:24 UTC] No.44379764{3}[source]▶

>>44378709 #

But this mean that now a saas baning you from your account for spurious reason can be a serious problem.

replies(2): >>44380206 #>>44383506 #

42. hinkley ◴[25 Jun 25 17:25 UTC] No.44379772[source]▶

>>44378127 (OP) #

I think the real purpose of Google’s recaptcha is to punish people who have privacy settings turned on, and gather training data for AI research.

43. logsr ◴[25 Jun 25 17:26 UTC] No.44379782[source]▶

>>44378127 (OP) #

In a few more years there will probably be virtually no human users of web sites and apps. Everything will be through an AI agent mediation layer. Building better CAPTCHAs is interesting technically, but it is doubling down on a failed solution that nobody actually wants. What is needed is an authentication layer that allows agents to act on behalf of registered users with economic incentives to control usage. CAPTCHA has always been an economic bar only, since they are easy to farm out to human solvers, and it is a very low bar. Having an agent API with usage charges is a much better solution because it compensates operators instead of wasting the cost of solving CAPTCHAs. Maybe this will finally be the era of micro payments?

replies(5): >>44379874 #>>44380264 #>>44383047 #>>44383285 #>>44383688 #

44. hinkley ◴[25 Jun 25 17:29 UTC] No.44379802{3}[source]▶

>>44378715 #

As I understand it detection software is also at great pains to make it difficult for bots to analyze the patterns of rejections to figure out what rule is catching them.

If they can narrow down the possibilities to quadratic space then you lose.

45. mdahardy ◴[25 Jun 25 17:37 UTC] No.44379874[source]▶

>>44379782 #

Co-founder of Roundtable here.

I agree that better authentication methods for AI agents are needed. But right now bots and malicious agents are a real problem for anyone running sites with significant traffic. In the long run I don’t think human traffic will go to zero even if its relative proportion is reduced.

46. imiric ◴[25 Jun 25 17:44 UTC] No.44379950{3}[source]▶

>>44379545 #

> Spam and abuse can come from computers, or from humans.

> Productive use can come from humans, or from computers.

I agree in principle, but the reality is that 37% of all internet traffic originates from bots[1]. The overwhelming majority of that traffic (89% according to Fastly) can be described as abusive. In turn, the abusive traffic from humans likely pales in comparison. It's vastly cheaper to setup bot farms than mechanical turk farms, and it's only getting cheaper.

Identifying the source of the traffic, while difficult, is a generalizable problem. Whereas tracking specific behavior will depend on each site, and will likely require custom implementation for each type of service. Or it requires invasive tracking of users throughout the duration of their session, as many fraud prevention systems do.

Both approaches can be deployed at the same time. A CAPTCHA is not meant to be the only security solution anyway, but as a first layer of defense that is generally simple to deploy and maintain.

That said, I concede that the sentence "[CAPTCHAs] are the only solution" is wrong. :)

> Proof-of-work is even more obviously a temporary solution, security by obscurity

I disagree, and don't see how it's security by obscurity. It's simply a method of increasing the access cost for abusive traffic. The more signals are gathered that identify the user as abusive, the higher the "price" they're required to pay to access the service. Whether the user is a suspected bot or not could just be one type of signal. Behavioral and cognitive signals as mentioned in TFA can be others. Yes, these methods aren't perfect, and can mistakenly penalize human users and be spoofed by bots, but it's the best we currently have. This is what I'd like to see improved.

Still, even with all their faults, I think PoW CAPTCHAs offer a much better UX than traditional CAPTCHAs ever did. Yes, telling humans apart from computers is getting more difficult, but it doesn't mean that the task is pointless.

[1]: https://learn.fastly.com/rs/025-XKO-469/images/Fastly-Threat...

47. mitthrowaway2 ◴[25 Jun 25 17:46 UTC] No.44379973{4}[source]▶

>>44379234 #

Weren't advancing computer vision (and digitizing books) among the goals of ReCAPTCHA? They seem to have been pretty successful with that.

replies(1): >>44379998 #

48. mzmzmzm ◴[25 Jun 25 17:46 UTC] No.44379977[source]▶

>>44378127 (OP) #

All of the behavioral analysis stuff going on in the background makes me wonder if big accessibility problems are brewing. If we're looking at how naturally keystrokes are input, what does that mean for someone who uses dictation tools that generate text in chunks? Will this strategy make accessibility worse in unforeseen ways?

49. timshell ◴[25 Jun 25 17:49 UTC] No.44379998{5}[source]▶

>>44379973 #

Google was successful in creating a labeled dataset for computer vision. That’s different than bot farms beating captchas via computer vision because there exists a financial incentive

50. illegally ◴[25 Jun 25 17:55 UTC] No.44380060[source]▶

>>44378127 (OP) #

It's pointless, it's just a matter of time when AI agents will be able to mimic human behavior exactly (they probably already do, it's just not public).

These tests here are easily bypassable, just adding a random delay somewhere during the action phases to mimic humans, and there's already tools for mimicking human mouse movements.

51. raincole ◴[25 Jun 25 17:56 UTC] No.44380078{4}[source]▶

>>44379234 #

> Traditional CAPTCHAs didn't cause bot farms to advance computer vision

Are you sure? And how do you know?

There are a lot of CAPTCHA cracking services. Given the price, they are hardly sustainable even under developing country wage level. I believe they actually solve the easy ones automatically and humans are only involved for the harder ones.

52. adityaagr ◴[25 Jun 25 17:57 UTC] No.44380089[source]▶

>>44378127 (OP) #

This is a super clean research post! Absolutely loved the demos too

53. koalaman ◴[25 Jun 25 18:02 UTC] No.44380144[source]▶

>>44378127 (OP) #

I'm not sure reCAPTCHA is really trying to detect automated vs human interaction with a browser. The primary use-case is to detect abusive use. The distinction here is if I automate my own browser to do things for me on sites using my personal account may not be a problem for site owners, while a spam operation or reselling operation which generates thousands of false accounts using automation is a big problem that they'd want to be able to block. I think reCAPTCHA is tailored towards the latter, and for it not to block the former might be more of a feature than a bug.

replies(1): >>44380246 #

54. nico ◴[25 Jun 25 18:05 UTC] No.44380175[source]▶

>>44378450 #

> Proof-of-work CAPTCHAs work well for making bots expensive to run at scale

“Expensive” depends on the value of what you do behind the captcha

There are human-solving captcha services that charge USD 1 for 1k captchas solved (0.1 cents per captcha)

So as long as you can charge more than what solving the captchas cost, you are good to go

Unfortunately, for a lot of tasks, humans are currently cheaper than AI

replies(2): >>44380306 #>>44380717 #

55. econ ◴[25 Jun 25 18:08 UTC] No.44380206{4}[source]▶

>>44379764 #

You could roll a new id to replace the previous one. Each user would still have only one at a time. If this isn't acceptable a service may ask to have the feature disabled for clear mission critical reasons and/or a fee.

56. roguecoder ◴[25 Jun 25 18:12 UTC] No.44380246[source]▶

>>44380144 #

LinkedIn, for example, doesn't care if you as a human are manually looking at all your connections one-by-one or if you have automated a bot to do it: it will lock you out the same either way.

57. contagiousflow ◴[25 Jun 25 18:14 UTC] No.44380264[source]▶

>>44379782 #

> Building better CAPTCHAs is interesting technically, but it is doubling down on a failed solution that nobody actually wants

I want it. I don't want my message boards to be people's AI agents...

58. econ ◴[25 Jun 25 18:16 UTC] No.44380306{3}[source]▶

>>44380175 #

There must be hilarious undiscovered unknown rube Goldberg machines out there where a human completes a captcha, then the host sells the captcha to the seller who passes it to next user who passes it to the next website who sells it again and so on.

59. _df ◴[25 Jun 25 18:17 UTC] No.44380313[source]▶

>>44379062 #

>Solutions relying on JavaScript ...

... break the Web.

ftfy

60. b0a04gl ◴[25 Jun 25 18:18 UTC] No.44380326[source]▶

>>44378127 (OP) #

assume this is basically nosedive but for presence on the internet. except you don't rate anyone. your device, motion, latency, and scroll inertia get rated by some pipeline you’ll never see. and that’s what decides what version of the site you get.

> what if the turing test already runs silently across every site you open. just passive gating based on scroll cadence, mouse entropy, input lag without captcha or prompt

>what if you already failed one today. maybe your browser fingerprint was too rare, maybe your keyboard rhythm matched a bot cluster from six months ago. so the UI throttled by 200ms. or the request just 403'd.

> what if the system doesn't need to prove you're a bot. it just needs a small enough doubt to skip serving you the real content.

> what if human is no longer biological but statistical. a moving average of behavior trained on telemetry from five metro cities. everyone outside that gets misclassified.

>what if you'll never know. timeline loads emptier than someone else with explicit rejection to the content

61. loandbehold ◴[25 Jun 25 18:23 UTC] No.44380369[source]▶

>>44378127 (OP) #

Aren't those distinctions only work because bots aren't specifically designed to circumvent them? If you have an arms race between bots and bot detectors, eventually bots will learn to overcome them to the point that you can't distinguish human and bot.

62. dataviz1000 ◴[25 Jun 25 18:30 UTC] No.44380453[source]▶

>>44378450 #

1. Create a website with a series of tasks to capture this data.

2. Send link to coworkers via Slack so they can spend five minutes doing the tasks.

3. Capture that data and create thousands of slight variations saved to db as profiles

4. Bypass bot protections.

There is nothing anyone can do to prevent bots.

replies(1): >>44381744 #

63. lucb1e ◴[25 Jun 25 18:34 UTC] No.44380493[source]▶

>>44378127 (OP) #

And so what am I supposed to do if a false positive happens?

I use keyboard navigation on many pages. Using the firefox setting "search when you start typing", I don't have to hit ctrl+f to search on the page, I just type what I want to click on and press enter or ctrl+enter for a new browser tab, or press (shift+)tab to go to the nearest (previous/next) input field. When I open HN, it's muscle memory: ctrl+t (new tab) new enter (autocompletes to the domain) thr enter (go to threads page) anything new? type first few chars of username, shift+tab+tab enter to upvote. Done? Backspace to go back. View comments of a link? Type last char of a word in the link, space, and first char of next word, that's almost always unique on the page, then escape, type men, enter, to almost always activate the comment link. Or shift+tab enter instead to upvote. On the comments page, reading top-level comments is either searching for [ and then enter+f3 when I want to collapse the next one, space for page down... Don't have to take my hands off the home row

etc. on lots of website, also ones I've never visited before (it'll be slower and less habitual of course, but still: if there is text near to where I want to go, I'm typing it). I use the mouse as well, but I find it harder to use than the keys that are always in the same place, much easier to press

So will it tell me that my mouse movements don't look human enough or will I see a "Sorry, something went wrong" http 403 error and have no clue if it's tracking cookies, my IP address, that I don't use Google Chrome®, that I went through pages too fast, that I didn't come past the expected page (where a cookie gets set) but clicked on a search result directly, that I have a bank in country A but residence in country B, that I now did too many tries in figuring out which of these factors is blocking me.... I can give examples of websites where I got blocked in the last ~2 months for each of these. It's such a minefield. The only thing that always passes is proof-of-work CPU challenges, but I dread to think what poor/eco people with slow/old computers are facing. Will this "invisible" captcha (yeah, invisible until you get banned) at least tell me how I'm supposed to give my money to whatever service or webshop will use this?

replies(1): >>44383110 #

64. lucb1e ◴[25 Jun 25 18:50 UTC] No.44380659[source]▶

>>44378450 #

> but [PoWs] still rely on accurate bot detection.

No they don't, that's the point: you can serve everyone a PoW and don't have to discriminate and ban real people. This system you're enthusiastic about is what tries to do this "accurate bot detection" (scratch the first word)

replies(1): >>44380773 #

65. msgodel ◴[25 Jun 25 18:53 UTC] No.44380693[source]▶

>>44378450 #

Everything on the web is a robot, every client is an agent for someone somewhere, some are just more automated.

Distinguishing en mass seems like a waste to me. Deal with the actual problems like resource abuse.

I think part of the issue is that a lot of people are lying to themselves that they "love the public" when in reality they really don't and want nothing to do with them. They lack the introspection to untangle that though and express themselves with different technical solutions.

replies(1): >>44380995 #

66. msgodel ◴[25 Jun 25 18:55 UTC] No.44380717{3}[source]▶

>>44380175 #

POW captchas aren't actually captchas, it's just hashcash (IE make sure the person reading the content is using as much or more compute as you are serving it so they can't DOS you either on purpose or accident.) We stopped needing it for a while because compute and bandwidth grew really fast while serverside software mostly stayed the same.

replies(1): >>44381732 #

67. lucb1e ◴[25 Jun 25 19:01 UTC] No.44380767{4}[source]▶

>>44379234 #

I don't see how that contradicts the parent post. Computer vision wasn't as good when reCAPTCHA was still typing out books, but machine learning has (per my expectation, having worked with it since ~2015, but the proof would be in the pudding) likely been good enough for mimicking e.g. keystroke timings for decades. It hasn't been needed until now. That doesn't mean they won't use it now that it is needed. Different situation from where tech did not yet exist

replies(1): >>44381088 #

68. vhcr ◴[25 Jun 25 19:01 UTC] No.44380773{3}[source]▶

>>44380659 #

The default policy of anubis tries to detect bots and changes the difficulty of the proof of work based on that.

https://github.com/TecharoHQ/anubis/blob/main/data/botPolici...

replies(1): >>44380798 #

69. Terr_ ◴[25 Jun 25 19:03 UTC] No.44380793[source]▶

>>44379206 #

IMO in most cases, the real need is to ensure the new account has "skin in the game", so that their requests are not frivolous and they will "care" about the good standing of their account.

70. lucb1e ◴[25 Jun 25 19:03 UTC] No.44380798{4}[source]▶

>>44380773 #

Oh... that I regularly see these pages working on a challenge probably says something about my humanness

71. lucb1e ◴[25 Jun 25 19:08 UTC] No.44380851{3}[source]▶

>>44378901 #

I had a security manager at a big bank (one of my first clients) tell straight to my face that the website decides whether to let me in before I even start typing the password(-equivalent) and that the password is just a formality not to scare people. Near as I could tell, he believed it himself

Marketing indeed. He had me doubting for a while what magic they weren't sharing with the rest of us to avoid countermeasures being developed, but I know better now (working in infosec, seeing what these systems catch, don't catch, and bycatch)

72. lucb1e ◴[25 Jun 25 19:13 UTC] No.44380894[source]▶

>>44379737 #

How does one graduate from probation, while being hellbanned / having your contribution quarantined? Since I'm certainly not wasting my time doing a second contribution so long as the first one isn't getting approved, it sounds like this would have to be a manual process or you'd lose out on new contributors that are seeing their work go to /dev/null and never returning

replies(1): >>44381067 #

73. bobbiechen ◴[25 Jun 25 19:25 UTC] No.44380995{3}[source]▶

>>44380693 #

I do think the answer is two-pronged: roll out the red carpet for "good bots", add friction for "bad bots".

I work for Stytch and for us, that looks like:

1) make it easy to provide Connected Apps experiences, like OAuth-style consent screens "Do you want to grant MyAgent access to your Google Drive files?"

2) make it easy to detect all bots and shift them towards the happy path. For example, "Looks like you're scraping my website for AI training. If you want to see the content easily, just grab it all at /LLMs.txt instead."

As other comments mention, bot traffic is overwhelmingly malicious. Being able to cheaply distinguish bots and add friction makes your life as a defending team much easier.

replies(1): >>44381023 #

74. msgodel ◴[25 Jun 25 19:29 UTC] No.44381023{4}[source]▶

>>44380995 #

IMO if it looks like a bot and doesn't follow robots.txt you should just start feeding it noise. Ignoring robots.txt makes you a bad netizen.

75. hinkley ◴[25 Jun 25 19:34 UTC] No.44381067{3}[source]▶

>>44380894 #

Do you believe what we are doing now is working? Because with the exception of places like this the internet sure looks pretty Dead to me.

You always have to show people their own edits. It's a common form of proofreading. But what's added and how often does matter. Misinformation is one thing. External links are potentially something much worse. I used to think SO had it figured out as far as mutual policing, but that's not working so well now either.

replies(1): >>44381691 #

76. Animats ◴[25 Jun 25 19:35 UTC] No.44381081[source]▶

>>44378127 (OP) #

Previous CAPTCHAs were based on tasks humans could do but machines could not. The machines caught up and passed humans on those tasks. These new tasks are based on the concept that humans are dumber than AI agents, making more mistakes and showing more randomness.

It might work for a while, but that's a losing battle.

replies(1): >>44381137 #

77. timshell ◴[25 Jun 25 19:36 UTC] No.44381088{5}[source]▶

>>44380767 #

Section 3 anticipates and addresses this objection.

The ultimate challenge is to replicate end-to-end natural human cognition, which is currently an unsolved and hard problem (and also not necessarily the main focus of AI researchers).

78. Animats ◴[25 Jun 25 19:39 UTC] No.44381135{4}[source]▶

>>44379504 #

Yes. As someone who runs with Firefox in full lockdown mode, including Privacy Badger and total blocking of Google Tag Manager, I have to click on a lot of fire hydrants and crosswalks.

Very few sites are broken by blocking Google's features, incidentally. Even Privacy Badger warns that blocking Google Tag Manager may break sites. It doesn't break anything important.

replies(2): >>44382416 #>>44388108 #

79. timshell ◴[25 Jun 25 19:39 UTC] No.44381137[source]▶

>>44381081 #

> These new tasks are based on the concept that humans are dumber than AI agents, making more mistakes and showing more randomness.

Hi this is incorrect. Different =/= dumber. The insight is that humans and computers have different constraints / algorithmic capabilities / objective functions / etc.

replies(1): >>44381150 #

80. timshell ◴[25 Jun 25 19:40 UTC] No.44381148[source]▶

>>44379737 #

Happy to help if I can :)

81. Animats ◴[25 Jun 25 19:40 UTC] No.44381150{3}[source]▶

>>44381137 #

For a few more years, humans who haven't been laid off yet can believe that.

82. calvinmorrison ◴[25 Jun 25 20:12 UTC] No.44381396{5}[source]▶

>>44379354 #

Or just charge bots and humans and we're good to go

https://www.nytimes.com/2006/02/05/technology/postage-is-due...

replies(2): >>44381899 #>>44381923 #

83. avoutos ◴[25 Jun 25 20:31 UTC] No.44381546[source]▶

>>44378127 (OP) #

Anyone know how this compares to Cloudflare Turnstile?

84. renegat0x0 ◴[25 Jun 25 20:47 UTC] No.44381659[source]▶

>>44378127 (OP) #

So recently two things have happened. I have been banned on reddit technology, and warned on other subreddit that I behave like a bot.

Maybe it was my fault to advertise my own solution in comments.

Such behavior however triggered bot detection. I might have behaved like a NPC. So currently a human can be identified as a bot, and banned on that premise. Crazy times.

Currently I feel I must act like a human.

replies(1): >>44383082 #

85. johnisgood ◴[25 Jun 25 20:48 UTC] No.44381669{3}[source]▶

>>44378709 #

I have not heard about DIDs at all before. How does this really work? They are Government-issued? I am not sure I would trust that though.

86. lucb1e ◴[25 Jun 25 20:50 UTC] No.44381691{4}[source]▶

>>44381067 #

I'm not sure what e.g. showing one one's own change answers. Do you manually review submissions or how does get one out of this initial "put everyone in quarantine" state?

I'm also not sure what "we" are doing now that makes the web look dead to you. I receive no more email spam than ten years ago, less if anything, and I haven't seen any spam on the places that I frequent like HN, stackexchange, wikipedia, mastodon, signal, github, etc.

replies(1): >>44382492 #

87. johnisgood ◴[25 Jun 25 20:54 UTC] No.44381732{4}[source]▶

>>44380717 #

Agreed, it indeed is Hashcash. I love it. So simple yet effective.

http://www.hashcash.org

https://en.bitcoin.it/wiki/Hashcash

https://en.wikipedia.org/wiki/Hashcash

C implementation (feature-rich): https://github.com/hashcash-org/hashcash/tree/master/c

A Factor (Forth-like language) implementation of it: https://github.com/factor/factor/blob/master/extra/hashcash/...

88. ATechGuy ◴[25 Jun 25 20:56 UTC] No.44381744{3}[source]▶

>>44380453 #

> There is nothing anyone can do to prevent bots.

Are you sure about this?

replies(1): >>44382417 #

89. turnsout ◴[25 Jun 25 20:58 UTC] No.44381765{4}[source]▶

>>44379234 #

It's possible they didn't advance computer vision, but they certainly applied it.

90. gus_massa ◴[25 Jun 25 21:02 UTC] No.44381805[source]▶

>>44378475 #

Is the guy on the motorcycle part of the motorcycle? I guess no.

Is the big box on the back seat part of the motorcycle? I guess yes.

Who can be sure???

91. ATechGuy ◴[25 Jun 25 21:04 UTC] No.44381822[source]▶

>>44378127 (OP) #

Please don't deploy this on the internet, it may block real users and lock them out.

92. throwaway48476 ◴[25 Jun 25 21:09 UTC] No.44381857[source]▶

>>44378127 (OP) #

Not all automation is malicious. AI promised us agents that will browse the web for us. PoW is useful in that the difficulty can be scaled to prevent egregious abuse but still lower the cost enough to allow non malicious use.

replies(1): >>44383706 #

93. TJSomething ◴[25 Jun 25 21:14 UTC] No.44381899{6}[source]▶

>>44381396 #

While that works for attacks that are like spam, bot detection for high margin attacks like show ticket scalping really wants an identity-oriented solution.

94. servercobra ◴[25 Jun 25 21:18 UTC] No.44381923{6}[source]▶

>>44381396 #

Ah yes, postage has stopped all the spam coming to my house!

replies(1): >>44382484 #

95. Dylan16807 ◴[25 Jun 25 21:19 UTC] No.44381930{6}[source]▶

>>44379398 #

This sounds like a red herring to me.

If the only way to associate a user with their ID is by fingerprinting them, you can do the same thing without an ID with having shadow profiles. If the proof system is designed for privacy, the ID doesn't make you more trackable.

In other words, if the ID never directly leaks companies can just make up a static ID for you and get the same results.

replies(1): >>44382103 #

96. lugu ◴[25 Jun 25 21:21 UTC] No.44381949[source]▶

>>44378127 (OP) #

It is late and I am thinking out load. How about a reputation system where users bring proof that other websites haven't found them abusive.

Visit a website that require identification. Generate a random unique identifier in your user agent. Live your life on that site. Download from that site a certificate that prove that your didn't abuse their site. Repeat that a few times.

Visit the site that wants to know if you are an abusive user. Share your certificates. They get to choose if they accept you.

If you abuse that site, it reports the abuse to the other sites that delivered you a certificate. Those sites gets to decide if they revoke their certificate or not.

It is a self policying system that require some level of cooperation. Users make themselves vulnerable to the risk of having sites they like loose trust in them.

replies(6): >>44382023 #>>44382106 #>>44382403 #>>44382406 #>>44383816 #>>44387374 #

97. jskrn ◴[25 Jun 25 21:27 UTC] No.44381986{5}[source]▶

>>44379310 #

Do you work for Worldcoin?

replies(1): >>44382375 #

98. Dylan16807 ◴[25 Jun 25 21:31 UTC] No.44382001{3}[source]▶

>>44379545 #

> Proof-of-work is even more obviously a temporary solution, security by obscurity: it relies upon symmetry in computation power, which is just wildly incorrect. And all of the implementations I know of have made the bone-headed decision to start with SHA-256 hashing, which amplifies this asymmetry to ludicrous degree (factors of tens of thousands with common hardware, to tens of millions with Bitcoin mining hardware). At that point, forget choosing different iteration counts based on bot detection, it doesn’t even matter.

It takes a long time and enormous amounts of money to make new chips for a specific proof of work. And sites can change their algorithm on a dime. I don't think this is a big issue.

replies(1): >>44383743 #

99. rcstank ◴[25 Jun 25 21:35 UTC] No.44382023[source]▶

>>44381949 #

Sounds like a privacy nightmare. Also, what one site calls abuse, another wouldn't.

100. imiric ◴[25 Jun 25 21:46 UTC] No.44382103{7}[source]▶

>>44381930 #

Kind of. A fingerprint is an implicit ID, whereas the ID suggested by GP would be semi-permanently associated to an individual. So it would make tracking even easier, since most web sites outside of adtech don't bother with sophisticated fingerprinting. It would be similar to a tracking cookie, except the user would have no control over it.

replies(1): >>44382237 #

101. spondylosaurus ◴[25 Jun 25 21:47 UTC] No.44382106[source]▶

>>44381949 #

Some stuff would definitely either slip through the cracks OR tarnish the reputation of legitimate users. What happens when someone's device gets compromised by a botnet that silently clicks ads in the background or turns that device into part of a DDoS army?

replies(1): >>44382221 #

102. thatcat ◴[25 Jun 25 22:01 UTC] No.44382198[source]▶

>>44378127 (OP) #

If the general Internet was based on torrents, then the required upload ratio enforcement would have ensured bots contribute to the reliability rather than destabilize the infrastructure.

replies(1): >>44383091 #

103. MichaelZuo ◴[25 Jun 25 22:04 UTC] No.44382221{3}[source]▶

>>44382106 #

Why would anyone even expect a perfectly zero false-positive and false-negative rate in the first place?

104. Dylan16807 ◴[25 Jun 25 22:07 UTC] No.44382237{8}[source]▶

>>44382103 #

> the ID suggested by GP would be semi-permanently associated to an individual

There is a permanent ID, but it doesn't have to be told to the site.

In which case it doesn't make tracking any easier than the site making up a "fake" ID for you.

105. Dylan16807 ◴[25 Jun 25 22:25 UTC] No.44382338{3}[source]▶

>>44379188 #

Just don't cause problems on purpose and almost nobody will care about blocking you. Don't be an asshole.

replies(1): >>44385747 #

106. ◴[25 Jun 25 22:33 UTC] No.44382375{6}[source]▶

>>44381986 #

107. encom ◴[25 Jun 25 22:35 UTC] No.44382394{3}[source]▶

>>44378709 #

I have to ask the government for a roided up tracking cookie?

Hell. No.

108. lq9AJ8yrfs ◴[25 Jun 25 22:36 UTC] No.44382403[source]▶

>>44381949 #

> It is a self policying system that require some level of cooperation.

How hard is it to obtain one of these certificates as a bot?

What you are describing though is possibly comparable to Privacypass.

Apple seems to be on board with Privacypass, perhaps they'll include a digital voucher of some kind with their devices and that presumably contributes to old devices getting worse as the voucher is spent down.

Just imagine if the whole web can contribute to planned obsolescence and you can pay for a fast, hassle free internet experience again just by buying a new phone.

And then you can dump the old ones on eBay for cheap as long as you don't plan on using them to access online services. Unless you are willing to settle for basic economy web experience.

109. awb ◴[25 Jun 25 22:36 UTC] No.44382406[source]▶

>>44381949 #

PageRank worked well for Google for a long time. This sounds like an adaptation of that that’s interesting to consider.

110. busymom0 ◴[25 Jun 25 22:37 UTC] No.44382416{5}[source]▶

>>44381135 #

For me it's having to click on bikes. Except the pictures are of motorcycles and not bicycles. English isn't my first language, so when I hear bike, I am thinking of bicycles and not motorcycles.

111. dataviz1000 ◴[25 Jun 25 22:37 UTC] No.44382417{4}[source]▶

>>44381744 #

I was part of the team managing tens of millions of dollars’ worth of NFL event-ticket inventory, which meant I had to automate the Ticketmaster UI to delist any ticket that was put into checkout or sold on a secondary market like StubHub. For legal reasons, Ticketmaster wouldn’t grant us direct access to their private API while they were still building out the developer API (which our backend team actually helped design), so I spent about half my time reverse-engineering and circumnavigating their bot protections on Ticketmaster, SeatGeek, StubHub, etc. I made it very clear that anyone caught using my code to automate ticket purchases would face serious consequences.

At the time, Ticketmaster’s anti-bot measures were the gold standard. They gave us fair warning that they planned to implement Mastercard’s SaaS-based solution (same as described in OP’s article), so I had everyone on the team capture keyboard-typing cadence, mouse movements, and other behavioral metrics. I used that as the excuse to build a Chrome extension that handled all of those tasks, and I leaned on the backend team to stop procrastinating and integrate the new API endpoints that Ticketmaster was rolling out. For about a week, that extension managed millions of dollars in inventory—until I got our headless browsers back up and running.

In the end, any lock can be picked given enough time; its only real purpose is to add friction until attackers move on to an easier target. But frankly, nobody can stop me from scraping data or automating site interactions if it’s more profitable than whatever else I could be working on. I have some ideas how to prevent me from using automated bots but all of the companies I've applied to over the years never respond -- that's on them.

112. throw10920 ◴[25 Jun 25 22:46 UTC] No.44382484{7}[source]▶

>>44381923 #

This is an extremely ignorant take. It's extremely well-known that one of the primary ways you stop spam is by making it economically infeasible, specifically by making the cost of distribution higher than the expected return. It's also extremely well-known that spam snail-mail is subsidized by the US post office and doesn't pay normal post rates.

replies(1): >>44387398 #

113. busymom0 ◴[25 Jun 25 22:47 UTC] No.44382492{5}[source]▶

>>44381691 #

> and I haven't seen any spam on the places that I frequent like HN, stackexchange, wikipedia, mastodon, signal, github, etc.

Could that just be because the modern LLM generated spam doesn't look like old-school spam? Just recently we learnt that a university conducted a study on Reddit changemyview subreddit using LLM generated comments without getting caught.

replies(1): >>44389630 #

114. __MatrixMan__ ◴[25 Jun 25 22:50 UTC] No.44382515[source]▶

>>44378450 #

> They're the only solution to the growing spam and abuse problem on the web

They're the only solution that doesn't require a pre-existing trust relationship, but the web is more of a dark forest every day and captchas cannot save us from that. Eventually we're going to have to buckle down and maintain a web of trust.

If you notice abuse, you see which common node caused you to trust the abusers, and you revoke trust in that node (and, transitively, everything that it previously caused you to trust).

replies(1): >>44384614 #

115. curtisblaine ◴[25 Jun 25 23:06 UTC] No.44382632[source]▶

>>44378127 (OP) #

AFAIK, reCaptcha is not based on user behaviour anymore since v3, but uses proprietary network host information from Google. See this issue opened in 2018: https://github.com/google/recaptcha/issues/235

116. Nevermark ◴[25 Jun 25 23:16 UTC] No.44382683[source]▶

>>44378127 (OP) #

The problem becomes simpler if you turn it around.

It is getting easier and easier to create questions/problems that humans can't answer at LLM speed.

Of course, that solves a complementary problem, not the original. But in terms of instances, by any definition, the demographics are quickly moving in one direction.

117. timshell ◴[25 Jun 25 23:26 UTC] No.44382747{4}[source]▶

>>44378752 #

I think this will be a positive effect of the rise of AI agents. We’re going to have a much different distribution of automated vs human traffic and authentication/methods will have to be more robust than they are now

118. emporas ◴[26 Jun 25 00:17 UTC] No.44383047[source]▶

>>44379782 #

Certainly. An authentication layer, and everything else customizable by the user.

The web, HTML that is, is a grammar, an app is a grammar, the buttons of my car are a grammar, I want each grammar served, transformed to my grammar however I like it, probably org-mode file grammar.

I don't want each website's colors, or clickable elements to be determined by any other person than the user. There are themes, I want to select exactly what theme I am browsing the internet today. I also want my fridge to be connected to the internet, accessed using an authentication layer on top of IPv6, and using it's functionality with a grammar.

In other words, the web, browsers, apps and physical buttons will go down the drain soon and they will be replaced by something which can open and manipulate org filetypes.

The web was/is a huge financial bubble anyway, and it will burst quickly when that happens.

119. ivanjermakov ◴[26 Jun 25 00:23 UTC] No.44383082[source]▶

>>44381659 #

I read your comment with a robot voice, kidding. False positives are always gonna be the problem.

120. ivanjermakov ◴[26 Jun 25 00:25 UTC] No.44383091[source]▶

>>44382198 #

The problem would move from verifying legitimacy of agent actions to legitimacy of agent reputation, which might even be easier to spoof.

121. ivanjermakov ◴[26 Jun 25 00:28 UTC] No.44383110[source]▶

>>44380493 #

I also use keyboard navigation and Vimium's quick link actions, and I often catch CAPCHAs and rate limiting on some websites because I'm too fast. Fun times!

122. HeatrayEnjoyer ◴[26 Jun 25 00:37 UTC] No.44383143{6}[source]▶

>>44379384 #

Ultimately trust must be placed in an entity of some type. A democratically elected body isn't perfect but I can't think of a better option. If the electorate don't care about digital privacy or elected lawmakers do not protect their rights, then that needs to be addressed first. Governments have a monopoly on violence. If a citizen can't trust their government to enact (or enact but then not follow) laws that protect human rights, they frankly have much bigger problems to solve.

replies(1): >>44383763 #

123. catlifeonmars ◴[26 Jun 25 01:03 UTC] No.44383285[source]▶

>>44379782 #

In this few years scenario why would there be a need for websites anyway? The bots can just use APIs.

124. protocolture ◴[26 Jun 25 01:34 UTC] No.44383459[source]▶

>>44378127 (OP) #

I have noticed that a particular website will tell me I fail captcha half the time, until I resize my browser from a square to a rectangle.

Took me ages to figure out what its issue was.

125. JimDabell ◴[26 Jun 25 01:37 UTC] No.44383468{6}[source]▶

>>44379384 #

> the governments powerful enough to roll something like this out are not trusted authorities which will protect the privacy of their citizens.

The trust I mentioned was the ability for third-parties to trust that the authority will not hand out IDs in an uncontrolled manner. I was not saying that the ID holders need to trust the authority:

> Users can identify themselves to third-parties without disclosing their real-world identity to the third-party and without disclosing their interaction with the third-party to the issuing body.

If the authority doesn’t know how your ID is used, you don’t have to trust the authority to keep that information private.

126. JimDabell ◴[26 Jun 25 01:44 UTC] No.44383498{6}[source]▶

>>44379398 #

> It can still be used for tracking.

This doesn’t make sense. The whole point of using IDs in this way is in an authenticated context.

Did you think I was suggesting that this ID would be accessible to any website without asking? This is something you would send as part of a registration step. So, for instance, if you spam Hacker News, you get banned, you try to register again, it receives the same ID as before and knows not to let you register.

replies(1): >>44383653 #

127. JimDabell ◴[26 Jun 25 01:46 UTC] No.44383506{4}[source]▶

>>44379764 #

That’s the point. Bans should be effective.

replies(1): >>44385932 #

128. Nextgrid ◴[26 Jun 25 02:16 UTC] No.44383653{7}[source]▶

>>44383498 #

Every website would just move on to force people to register. That's already happening - good luck browsing public posts on Twitter/X.

replies(1): >>44384054 #

129. Nextgrid ◴[26 Jun 25 02:22 UTC] No.44383688[source]▶

>>44379782 #

> allows agents to act on behalf of registered users with economic incentives to control usage

There's a huge economy out there based on wasting human time. They explicitly do not want agents acting on behalf of humans, because it means human time is no longer being wasted.

They also don't want to get paid in money, because the money would go to a different profit center. The only payment they accept (because they use that as a metric to justify their salary) is "engagement" aka proof of wasted human time.

replies(1): >>44384526 #

130. Nextgrid ◴[26 Jun 25 02:26 UTC] No.44383706[source]▶

>>44381857 #

It's malicious if your compensation depends on wasting human time. Sadly, a lot of people's careers and compensation does depend on that.

131. chrismorgan ◴[26 Jun 25 02:33 UTC] No.44383743{4}[source]▶

>>44382001 #

Even disregarding the SHA-256 thing, there is unavoidable significant asymmetry and range that renders proof of work unviable. One legitimate user may use a low-end phone, another may have a high-end desktop that can work a hundred or more times as fast whatever technique you use, and an attacker may have a bot net.

It’s important to assume, in security and security-adjacent things, that the attacker has more compute power than the defender. You cannot win in this way.

Proof-of-work is bad rate limiting that relies upon the server having a good estimate of the capabilities of the client. No more, no less.

I bring up the SHA-256 thing as an argument that none of the players in the space are competent. None of them. If you exclude hand-rolled cryptography or known-bad techniques like MD5, SHA-256 is very literally the worst choice remaining: its use in Bitcoin and the rewards available have utterly broken it for this application. If you intend proof of work to actually be the line of defence, you start with something like Argon2d instead. I honestly think that, at this stage, these scripts could replace their proof of work with a “sleep for one second” (maybe adding “or two if I think you’re probably a bot”) routine and have the server trust that they had done so, without compromising their effectiveness.

132. switknee ◴[26 Jun 25 02:36 UTC] No.44383763{7}[source]▶

>>44383143 #

Part of solving that problem is to make it expensive for governments to violate human rights. If spying on everyone is easier than targeted spying, they'll spy on everyone. Governments have a lot of different priorities and it's not always easy to balance them.

Online identity verification is probably best handled by an organization with that as a single priority.

Under the government ID scheme, we have to trust [bad corrupt government] to verify all citizens of [bad corrupt government]. Since that government frequently lies and acts maliciously using every means at their disposal, platforms will treat IDs verified by that government similar to bot traffic and the country will be cut off from the public internet. You'll be banning scientists and journalists from working with others around the world, just because they live in a country with an obnoxious government.

Isn't it also best if people can have multiple identities? Or should someone's contributions to X field be discounted because of their dabbling in fringe Y field?

133. driverdan ◴[26 Jun 25 02:50 UTC] No.44383816[source]▶

>>44381949 #

Absolutely not. You should not want a service to do privacy invasive cross site tracking like that.

replies(1): >>44387305 #

134. userbinator ◴[26 Jun 25 03:33 UTC] No.44383959[source]▶

>>44378127 (OP) #

Or you could just ask it to count how many letters are in certain words.

135. 21ce2 ◴[26 Jun 25 03:54 UTC] No.44384051[source]▶

>>44378450 #

I run a company that relies on bots getting past captchas. It's not hard to get past captchas like this. Anyone with even a medium-sized economic incentive will figure it out. There'll probably be free open-source solutions soon.

136. JimDabell ◴[26 Jun 25 03:55 UTC] No.44384054{8}[source]▶

>>44383653 #

Again, this is a mechanism for making existing auth more resilient.

As you note, websites can already force people to register, so this isn’t adding anything new there.

137. xarope ◴[26 Jun 25 04:07 UTC] No.44384104[source]▶

>>44378127 (OP) #

> Take for example the Stroop task . It's a classic psychology experiment where humans select the color a word is written it and not what the word says. Humans typically show slower responses when the meaning of a word conflicts with its color (e.g., the word "BLUE" written in green), reflecting an overriding of automatic behavior . Bots and AI agents, by contrast, are not subject to such interference and can respond with consistent speed regardless of stimuli.

So I completely disagree with this; you can train youself to completely ignore the color and just read/act on the word very fast. In fact, this is a game that people play.

138. pzo ◴[26 Jun 25 04:57 UTC] No.44384295{5}[source]▶

>>44379354 #

I think worldcoin added this year (?) identification using government e-passport as well (not only orb) - all modern passport have NFC/RFID chip, you won't get all data from that in public way but can verify signature and can get basic information. There are already apps in appstore doing that.

139. sly010 ◴[26 Jun 25 05:22 UTC] No.44384397[source]▶

>>44378127 (OP) #

First off, I always thought the type of things described (tracking mouse movements, keypress jitter, etc) are already done by ReCacpha to decide when to present the user with a captcha. I am surprised they are not already doing this.

Second, I am surprised AI agents are this naive. I thought they would emulate human behavior better.

In fact, just based on this article, very little effort has been put into this race on either side.

So I wonder if is has to do with the fact that if companies like google reliably filtered out bot traffic, they would loose 90% of their AD revenue. This way they have plausible deniability.

replies(1): >>44384504 #

140. fredfish ◴[26 Jun 25 05:43 UTC] No.44384504[source]▶

>>44384397 #

They were very proud of this mouse movement stuff when the desktop was 70% of traffic.. It's not worth as much investment as its been given since there's no group limiting people to one HID method and removing accessibility from world.

141. sly010 ◴[26 Jun 25 05:48 UTC] No.44384526{3}[source]▶

>>44383688 #

Nah. You misunderstood. "They" don't make money on human time wasted. They make money on ads served. They don't particularly care if the ads were served to humans or agents, they get paid either way. Bot-traffic is actually good for tech companies because it inflates numbers. Capthas are not there to waste our time, but are there to improve their credibility ("We are certain those ad-clicks were real humans because the captha said so").

142. imiric ◴[26 Jun 25 06:08 UTC] No.44384614{3}[source]▶

>>44382515 #

That might be the way to go. Someone else in the thread mentioned a similar reputation system.

The problem is that such a system could be easily abused or misused. A bad actor could intentionally or mistakenly penalize users, which would have global consequences for those users. So we need a web of trust for the judges as well, and some way of disputing and correcting the mistake.

It would be interesting to prototype it, though, and see how it could work at scale.

replies(2): >>44387293 #>>44389009 #

143. dsekz ◴[26 Jun 25 06:18 UTC] No.44384662[source]▶

>>44378127 (OP) #

Plenty of improvements to mouse movement algorithms have already been made and they’re still evolving. While the blog post and the product it introduces offer some interesting ideas, they don’t yet reach the robustness of modern anti-bot solutions and still trail current industry standards. I doubt it would take me - or any average reverse engineer - more than five seconds to bypass something like this. There are already numerous open source mouse movement libraries available; and even if they didn’t exist, writing one wouldn’t be difficult. Yes, mouse movement or keyboard data can be quite powerful in a modern anti-bot stack and an in depth analysis of it is genuinely valuable, but on its own it’s still insufficient. Relying on this data alone isn’t costly for the attacker and offers little real protection.

replies(1): >>44386879 #

144. BiteCode_dev ◴[26 Jun 25 10:20 UTC] No.44385932{5}[source]▶

>>44383506 #

I get it. And also, I know that Apple and Google would abuse that, and destroy lives and businesses as casually as I eat my breakfast. Then 1000's of disposable companies would pop up with valid id, and abuse some system (like terrible DMCA) and make it worse.

If you think people self-censoring themselves on social media is now a problem (the "unlive" novlang is always such a dystopic hint to me), you have seen nothing.

replies(1): >>44387300 #

145. lofaszvanitt ◴[26 Jun 25 12:26 UTC] No.44386740[source]▶

>>44378326 #

Well, they need a doctor signed key pair that shows that they are legit cripples. Feed it to the browser, win.

146. randomtoast ◴[26 Jun 25 12:29 UTC] No.44386769[source]▶

>>44378127 (OP) #

Isn’t it possible to emulate mouse movements and keypress jitter using a neural network trained on human data in order to simulate human behavior?

147. klaussilveira ◴[26 Jun 25 12:43 UTC] No.44386879[source]▶

>>44384662 #

> they don’t yet reach the robustness of modern anti-bot solutions

Like what?

148. emrehan ◴[26 Jun 25 13:32 UTC] No.44387254[source]▶

>>44378450 #

https://zkpassport.id could be used to prove that you’ve a government ID with NFC capabilities, without revealing anything like your nation.

https://docs.zkpassport.id/faq https://docs.zkpassport.id/examples/personhood

replies(1): >>44387906 #

149. nhecker ◴[26 Jun 25 13:37 UTC] No.44387293{4}[source]▶

>>44384614 #

Hyphanet (formerly Freenet) uses a similar Web of Trust, if you want to see a real-life example in action. Maybe Freenet still uses a WoT as well, I'm not sure.

150. JimDabell ◴[26 Jun 25 13:39 UTC] No.44387300{6}[source]▶

>>44385932 #

Businesses should not be forced to serve abusive users. They should have the choice to refuse to serve somebody permanently. You do not have the right to use somebody else’s service without their permission. If they want you off their platform, they should be able to do so.

The whole point of having trusted issuers is that there aren’t any “disposable companies” who hand out many identities in an uncontrolled manner. If there were, they would quickly become untrusted, making the IDs worthless.

151. dadoum ◴[26 Jun 25 13:39 UTC] No.44387305{3}[source]▶

>>44383816 #

There are cryptography primitives allowing you to privately make an intersection of the certificates you have and the providers the site would trust and compute a kind of score while not exposing any of your certificates or which providers trusted you amongst them. (the only thing is that a website could extract the knowledge that one specific provider trusted you if they only trust one, but that could probably be fixed with a better protocol that the one I have in mind).

152. Saris ◴[26 Jun 25 13:44 UTC] No.44387346[source]▶

>>44378127 (OP) #

I feel like analyzing keystrokes or mouse movements is just going to punish people who use password managers that autofill for them. It does seem like I get more captchas when on sites because of that.

153. jefftk ◴[26 Jun 25 13:47 UTC] No.44387374[source]▶

>>44381949 #

> How about a reputation system where users bring proof that other websites haven't found them abusive.

If you're not careful something like that can subvert the efforts to reduce cross-site tracking, but you can do resolve this with thoughtful cryptography: https://privacysandbox.google.com/protections/private-state-...

154. nc0 ◴[26 Jun 25 13:48 UTC] No.44387398{8}[source]▶

>>44382484 #

> Say something everyone lives everyday around the world. > "This is an extremely ignorant take."

replies(1): >>44387471 #

155. PaulHoule ◴[26 Jun 25 13:53 UTC] No.44387442[source]▶

>>44378127 (OP) #

Personally I think CAPTCHAs are harmful. They defend the enshittification economy, preventing the development of tools that protect human users of the web right in a time when it is more practical than ever to develop those tools.

Even in 2009 I knew people who were using neural networks (in PHP no less!) to decode CAPTCHAs with superhuman peformance. I see the whole thing as performative as those things get in my way tens or hundreds of times a day when I browse the web as a human but in years of webcrawling they didn't give me any trouble until the last two weeks.

156. ◴[26 Jun 25 13:56 UTC] No.44387471{9}[source]▶

>>44387398 #

157. codedokode ◴[26 Jun 25 14:37 UTC] No.44387879[source]▶

>>44378127 (OP) #

Recently I started getting a captcha when trying to use Google, probably because of VPN or Linux. I decided to switch to bing and duckduckgo. Dear Google, go solve your captchas yourself.

158. codedokode ◴[26 Jun 25 14:40 UTC] No.44387906{3}[source]▶

>>44387254 #

I hope people have some self-respect not to show their ID to use a website.

159. codedokode ◴[26 Jun 25 14:45 UTC] No.44387968{3}[source]▶

>>44378709 #

If this gets implemented, the next thing the govt will do is require all websites to store DIDs of visitors for at least 10 years and not accept visitors without them.

160. codedokode ◴[26 Jun 25 14:59 UTC] No.44388108{5}[source]▶

>>44381135 #

I started using duckduckgo when Google requires to solve a captcha for searching.

161. fennecbutt ◴[26 Jun 25 16:38 UTC] No.44389004[source]▶

>>44378450 #

I think we'll have to go with id connected to a real human eventually tbh.

Y'all will balk at that but in a decade or so I think we'll have no other choice.

But even that will fail since certain countries will likely be less precious about their system for this and spammers will still get fake ids. Same problem as now with phone numbers/rcs spam.

162. fennecbutt ◴[26 Jun 25 16:39 UTC] No.44389009{4}[source]▶

>>44384614 #

Well, apathetic society needs to band together to hold those bad actors to account.

I don't see this ever happening, though.

163. lucb1e ◴[26 Jun 25 17:48 UTC] No.44389630{6}[source]▶

>>44382492 #

If you let good old spam bots loose on a forum, it will outnumber legit messages ten to one easily. If someone uses an LLM to argue about certain topics, that's not spam. It's manipulative, and clearly abusive if there is no human in the loop, but there's no commercial message they're spreading, or at least I'm not seeing any 'buy xyz here <link>', nor off-topic messages

Then again, I noticed a few years ago in a previous "when to report as spam" discussion on HN that this is a lost cause. People will label things they signed up for as spam because they didn't want to receive it anymore, and others defended that behavior. Abuse and manipulation might as well go in that same category of "anything I don't want to read == spam", just know that when you say "spam", there's some people like me who will understand what it used to mean and try to interpret the message as referring to things such as the stereotypical viagra spam

replies(1): >>44389675 #

164. busymom0 ◴[26 Jun 25 17:54 UTC] No.44389675{7}[source]▶

>>44389630 #

They may not exactly be posting 'buy xyz here <link>' but they can use LLM's human like sentences to advocate for/against whichever product some company wants. A lot of people rely on google searching "best monitor for Mac Reddit" to get reddit results. However, with LLM, these no longer seem reliable.

replies(1): >>44389683 #

165. lucb1e ◴[26 Jun 25 17:54 UTC] No.44389683{8}[source]▶

>>44389675 #

I guess that can be spam but I'm not seeing a ton of that either, so it's still not obvious to me we're losing this or that war

The thing we're losing is the open web where anyone can view any page and buy any product they can afford without being banned (edit: or, reading back up what this was about, contributing to projects without the input going to /dev/null apparently)

↑