Most active commenters
  • jmtame(5)

←back to thread

454 points positiveblue | 12 comments | | HN request time: 0.874s | source | bottom
1. jmtame ◴[] No.45066514[source]
I pretty much use Perplexity exclusively at this point, instead of Google. I'd rather just get my questions answered than navigate all of the ads and slowness that Google provides. I'm fine with paying a small monthly fee, but I don't want Cloudflare being the gatekeeper.

Perhaps a way to serve ads through the agents would be good enough. I'd prefer that to be some open protocol than controlled by a company.

replies(4): >>45067765 #>>45071314 #>>45073000 #>>45073391 #
2. verdverm ◴[] No.45067765[source]
Perplexity has been one of the AI companies that created the problem that gave rise to this CF proposal. Why doesn't Perplexity invest more into being a responsible scraper?

https://blog.cloudflare.com/perplexity-is-using-stealth-unde...

replies(1): >>45068273 #
3. jmtame ◴[] No.45068273[source]
Re-read what I wrote.
replies(1): >>45079377 #
4. Fabricio20 ◴[] No.45071314[source]
This has been my experience more recently as well, I've finally migrated from google to Brave Search since google was just slow for me.

I also appreciate the AI search results a bit when im looking for something very specific (like what the yaml definition for a docker swarm deployment constraint looks like) because the AI just gives me the snippet while the search results are 300 medium blog posts about how to use docker and none of them explain the variables/what each does. Even the official docker documentation website is a mess to navigate and find anything relevant!

replies(1): >>45075833 #
5. jeroenhd ◴[] No.45073000[source]
Perplexity is the problem Cloudflare and companies like it are trying to solve. The company refuses to take no for an answer and will mislead and fake their way through until they've crawled the content they wanted to crawl.

The problem isn't just that ads can't be served. It's that every technical measure to attempt to block their service produces new ways of misleading website owners and the services they use. Perplexity refuses any attempt at abuse detection and prevention from their servers.

None of this would've been necessary if companies like Perplexity would've just acted like a responsible web service and told their customers "sorry, this website doesn't allow Perplexity to act on your behalf".

The open protocol you want already exists: it's the user agent. A responsible bot will set the correct user agent, maybe follow the instructions in robots.txt, and leave it at that. Companies like Perplexity (and many (AI) scrapers) don't want to participate in such a protocol. They will seek out and abuse any loopholes in any well-intended protocol anyone can come up with.

I don't think anyone wants Cloudflare to have even more influence on the internet, but it's thanks to the growth of inconsiderate AI companies like Perplexity that these measure are necessary. The protocol Cloudflare proposes is open (it's just a signature), the problem people have with it is that they have to ask Cloudflare nicely to permit website owners to track and prevent abuse from bots. For any Azure-gated websites, your bot would need to ask permission there as well, as with Akamai-gated websites, and maybe even individual websites.

A new protocol is a technical solution. Technical solutions work for technical problems. The problem Cloudflare is trying to solve isn't a technical problem; it's a social problem.

replies(1): >>45075817 #
6. rs_rs_rs_rs_rs ◴[] No.45073391[source]
>but I don't want Cloudflare being the gatekeeper

Cloudflare is not the gatekeeper, it's the owner of the site that blocks Perplexity that's "gatekeeping" you. You're telling me that's not right?

replies(1): >>45075775 #
7. jmtame ◴[] No.45075775[source]
Cloudflare is a gatekeeper because they’re trying to insert themselves between the owner and the end-user. Despite all the altruistic signaling, they really just want to capitalize on AI. And they’re happy to do that even if it results in a subpar experience for the end-user. They started this with a focus on news organizations, so I’m not particularly excited about trying to block AI access and lock down the web through one private company just so we can preserve 90s era clickbait businesses.
replies(1): >>45076196 #
8. jmtame ◴[] No.45075817[source]
You’re referencing an old and outdated technology that has no capability to handle things like revenue and attribution. New protocols will need to evolve to the current use. Owners want money, so make the protocol focused on that use case.

I’m not here to propose a solution. I’m here as an end-user saying I won’t go back to the old experience which is outdated and broken.

9. jmtame ◴[] No.45075833[source]
Not to mention how much worse it is on mobile. Every web site asks me to accept their cookies, close layers of ads with tiny buttons, and loads slowly with ads spread throughout the content. And that’s just to figure out if I’m even on the right page.
replies(1): >>45081028 #
10. rs_rs_rs_rs_rs ◴[] No.45076196{3}[source]
>Cloudflare is a gatekeeper because they’re trying to insert themselves between the owner and the end-user

But they can't insert themselves without the owner directly adding them. So it's the owner that's doing the gatekeeping(regardless if it's Cloudflare or iptables rules)

I think all you AI people blaming Cloudflare are just trying to deflect from the actual problem which is more and more owners don't want AI crawlers going through their content.

If Cloudflare dissapears who are you going to blame next, the iptables developers, maybe Linus Torvalds?

11. verdverm ◴[] No.45079377{3}[source]
and what am I supposed to garner from the re-read?

What did you say that relates to Perplexity being one of the reasons that Cloudflare and their customers have decided they need better protection from abusive scrapers?

Websites choose their own gatekeepers, Cloudflare is just one provider

12. maltelandwehr ◴[] No.45081028{3}[source]
The horrible UX on mobile, especially when traveling to another country and having to deal with forced geo-redirects, is the main reason I have largely replaced Google with ChatGPT for my everyday search needs.