My rule of thumb from the past experiences is that if you demand a 99.9% uptime for your own systems and you have an in-house auth, then that auth system must have 99.99% reliability. If you are serving auth for OTHERS, then you have a system that can absolutely never be down, and at that point five nines becomes a baseline requirement.
Auth is a critical path component. If your service is in the critical path in both reliability and latency[ß] for third parties, then every one of your failures is magnified by the number of customers getting hit by it.
ß: The current top-voted comment thread includes a mention that latency and response time should also be part of an SLA concern. I agree. For any hot-path system you must be always tracking the latency distribution, both from the service's own viewpoint AND from the point of view of the outside world. The typically useful metrics for that are p95, p99, p999 and max. Yes, max is essential to include: you want to always know what was the worst experience someone/something had during any given time window.
All of those significantly influence the response capability in a way which makes tracking latency next to useless. Maybe there is something we can be doing though. In more than a couple scenarios we do have tracking in place, metrics, and alerting, it just doesn't end up in our SLA.
The same can apply to latency. What is the latency of requests to your system—including dependencies you choose, excluding dependencies the customer chooses. The network leg from the customer or user to your system is a bit of a gray area. The simplest thing to do is measure each request's latency from the point of view of your backend rather than the initiator. This is probably good enough, although in theory it lets you off the hook a bit too easily—to some extent you can choose whether you run near the initiator or not and how many round trips are required, and servers can underestimate their own latency or entirely miss requests during failures. But it's not fair to fail your SLA because of end-user bufferbloat or bad wifi or a crappy ancient Chromebook with too many open tabs or customer webapp server's GC spiral or whatever. Basically impossible to make any 99.999% promises when those things are in play.
My preferred form of SLO is: x% of requests given y ms succeed within y ms, measured by my server. ("given" meaning "does not have an upfront timeout shorter than" and "isn't aborted by the client before".) I might offer a few such guarantees for a particular request type, e.g.:
* 50% of lookups given 1 ms succeed within 1 ms.
* 99% of lookups given 10 ms succeed within 10 ms.
* 99.999% of lookups given 500 ms succeed within 500 ms.
I like to also have client-side and whole-flow measurements but I'm much more cautious about promising anything about them.