←back to thread

405 points blindgeek | 3 comments | | HN request time: 1.083s | source
1. miki123211 ◴[] No.42175797[source]
As a blind person, I genuinely believe that hCaptcha, being as terrible as it is, is still the best solution among the ones that we can physically achieve in the world as it exists right now.

Audio captchas don't work for people with hearing issues and/or who don't speak your n supported languages, where n is usually <10. I've had to help people out with these over the phone, it was not fun.

Even for people for whom they do work, it's worth keeping in mind that bots can solve them by now, and so users whose activity looks too fraudulent, who are still given access to the visual captchas, have to be blocked from using the audio ones. I have also seen this happen.

Text captchas are a non-option by now, they're very easy to solve with LLMs, and the way they have to be phrased makes it impossible to align LLMs not to solve them, like you can do with the visual ones.

Google's ReCaptcha can get away with having no actual challenge for most users, blind or otherwise, but that's because they're Google, they do enough user tracking that they don't actually need a captcha. Google is the only company that can get away with this, and even for them, it doesn't work in all situations, even when the user fully trusts Google and has not adjusted any privacy preferences.

Sure, you could stop using captchas entirely, if you're fine with receiving dozens of viagra ads on every single platform each day, abolishing all "contact us" and comment forms on the internet, having a significantly higher credit card fraud rate (which translates directly to higher prices and a much worse experience for consumers), and getting all your semi-public records and social media activity immediately scraped by shady companies and sold to anybody who expresses any interest. Unsurprisingly, most users are, in fact, not fine with this.

replies(1): >>42176996 #
2. blindgeek ◴[] No.42176996[source]
> and getting all your semi-public records and social media activity immediately > scraped by shady companies and sold to anybody who expresses any interest.

Public content on the Internet should be scrapable. That's what public means.

The fact that my reddit posts were publicly available never bothered me. Even if they were going to be used to train some LMM. What does bother me is reddit locking up my posts and making exclusive deals with Google to train Google's LMM.

Preventing scraping isn't good for the average user; it is good for the company that wants to take content created by said user, lock it up, and sell it to their buddies.

replies(1): >>42182848 #
3. miki123211 ◴[] No.42182848[source]
> Public content on the Internet should be scrapable. That's what public means.

Not necessarily, especially if you want to expose some relationships in one direction while hiding the other.

Imagine your government creates a CNAM-like[1][2] system that lets you enter a phone number and see their owner, to see who is calling you and whether a number you're given is legit. However, they do not want to let you see a person's phone number just by entering their name.

If there's no captcha, an unscrupulous actor, registered in the Seychelles and unconcerned with your country's laws, can just scrape all possible phone numbers and offer a "reverse lookup" service.

In a way, the number/name records are public information, after all, the government lets you query them without authentication, but in a way they aren't, because you're only permitted to query them in a certain way.

Variations of this problem have appeared many times, particularly across Europe, usually with company numbers, property deeds and such.