←back to thread

454 points positiveblue | 2 comments | | HN request time: 0s | source
Show context
zzo38computer ◴[] No.45068855[source]
With what they say about authorization, I think X.509 would help. (Although central certificate authorities are often used with X.509, it does not have to be that way; the service you are operating can issue the certificate to you instead, or they can accept a self-signed certificate which is associated with you the first time it is used to create an account on their service.)

You can use the admin certificate issued to you, to issue a certificate to the agent which will contain an extension limiting what it can be used for (and might also expire in a few hours, and also might be revoked later). This certificate can be used to issue an even more restricted certificate to sub-agents.

This is already possible (and would be better than the "fine-grained personal access tokens" that GitHub uses), but does not seem to be commonly implemented. It also improves security in other ways.

So, it can be done in such a way that Cloudflare does not need to issue authorization to you, or necessarily to be involved at all. Google does not need to be involved either.

However, that is only for things where would should normally require authorization to do anyways. Reading public data is not something that should requires authorization to do; the problem with this is excessive scraping (there seems to be too many LLM scraping and others which is too excessive) and excessive blocking (e.g. someone using a different web browser, or curl to download one file, or even someone using a common browser and configuration but something strange unexpected happens, etc); the above is something unrelated to that, so certificates and stuff like that does not help, because it solves a different problem.

replies(1): >>45072866 #
1. jeroenhd ◴[] No.45072866[source]
What problem does this solve that a basic API key doesn't solve already? The issue with that approach is that you will require accounts/keys/certificates for all hosts you intend to visit, and malicious bots can create as many accounts as they need. You're just adding a registration step to the crawling process.

Your suggested approach works for websites that want to offer AI access as a service to their customers, but the problem Cloudflare is trying to solve is that most AI bots are doing things that website owners don't want them to do. The goal is to identify and block bad actors, not to make things easier for good actors.

Using mTLS/client certificates also exposes people (that don't use AI bots) to the awful UI that browsers have for this kind of authentication. We'll need to get that sorted before an X509-based solution makes any sense.

replies(1): >>45077076 #
2. zzo38computer ◴[] No.45077076[source]
> What problem does this solve that a basic API key doesn't solve already?

Many things, including improved security, and the possibility of delegating authorization in the ways described in their article (if you do not restrict the certificate from issuing further certificates, and if you define an extension for use with your service to specify narrower authorization, and document this).

> The issue with that approach is that you will require accounts/keys/certificates for all hosts you intend to visit, and malicious bots can create as many accounts as they need. You're just adding a registration step to the crawling process.

Read the last paragraph of what I wrote, which explains why that issue does not apply. However, even if registration is required (which I say should not be required for most things anyways, especially read-only stuff), it does not necessarily have to be that fast or automatic.

> Your suggested approach works for websites that want to offer AI access as a service to their customers, but the problem Cloudflare is trying to solve is that most AI bots are doing things that website owners don't want them to do. The goal is to identify and block bad actors, not to make things easier for good actors.

The approach I describe would work for many things where authentication and authorization helps (most of which does not involve AI).

I do know that it does not solve the problem that Cloudflare is trying to solve, but it does what it says in the article about authorization, and in a secure way. And, it is open, interoperable, and standardized.

The problem that Cloudflare is trying to solve cannot be solved in this way, and the way Cloudflare tries to do it is not good either.

Things that AI bots are doing to other's sites includes such things as excessive scraping, rather than accessing private data (even if they might do that too, Cloudflare's solution won't help with that at all either). (There is also excessive blocking, but Cloudflare is a part of the problem, even if some of the things they do sometimes help.)

See comment 45068556. Not everything should require authentication or authorization. Also see many other comments, that also mention why it does not help.

> Using mTLS/client certificates also exposes people (that don't use AI bots) to the awful UI that browsers have for this kind of authentication. We'll need to get that sorted before an X509-based solution makes any sense.

OK, it is a valid point, but this could be improved, independently. (Before it is fixed (and even afterward if wanted), X.509 could be made as only one type of authentication; the service could allow using a username/password (and/or other things, such as TOTP) as well for people who do not want to use X.509.)

Also, AI bots are not the only kind of automated access (and is not one that I use personally, although other people might); you could also be using a API for other purposes, or you might be using a command-line program for manual access without the use of a web browser, etc.