> So we went from a latency of 1 second to 0.091 seconds which is an 11 times improvement.
There's your problem -- you should never allow unbounded queue growth at high utilization. Going from 80% to 90% utilization doubles your average wait times. We could similarly make this number arbitrarily large by pushing that utilization closer to 1, e.g. "We halved service time at 99.99% utilization and saw a 10000x improvement". But that's not interesting, as your users will complain that your service is unusable under heavy load far before you get to that point.
The typical "fix" is to add load shedding (e.g. based on queue length) combined with some intelligent backoff logic at the client (to reduce congestion), and call it a day. This has its own downsides, e.g. increased latency for everyone in cases of overload. Or, if your configured queue length is too large, you get bufferbloat.
(I have seen an argument for using LIFO instead of FIFO, which achieves much more predictable median performance at the expense of causing unbounded badness in the worst case. For this, your client needs to set deadlines, which it should probably be doing anyways.)