What happens to latency if service time is cut in half (2022)

Show context

kqr ◴[18 Jun 24 09:52 UTC] No.40715797[source]▶

Another of those somewhat counter-intuitive results is the answer to the question "how much do we need to scale up to avoid response time regression when the request rate is doubled?"

It is very easy to blurt out "well obviously we need twice the processing power!" but if we scale to twice the processing power, then start accepting twice the request rate – we will actually be serving each request in half the time we originally did.

To many people that sounds weird; it sounds like we got something for nothing. If I invite twice as many people to a party and buy twice as many cookies, it's not like each guest will get twice as many cookies – that just leaves the originally planned number of cookies for each guest.

But for response time it comes back to the first equation in TFA:

    T = 1/μ · 1/(1 - ρ)

Doubling both arrival rate and maximum service rate leaves ρ – and the second factor with it – unchanged, but still halves the 1/μ factor, resulting in half the response time.

The appropriate amount to scale by is the k that solves the equation we get when we set the old response time T at request rate λ equal to the one at request rate 2λ and kμ processing power. This is something like

    T = 1/μ · 1/(1 - λ/μ) = 1/kμ · 1/(1 - 2λ/kμ)

but rearranges to the much simpler

    k = ρ + 1

which a colleague of mine told me interprets intuitively as "the processing power that needs to be added is exactly that which will handle an additional unit of the current utilisation on top of the current utilisation, i.e. twice as much."

This is mostly good news for people doing capacity planning in advance of events etc. If you run your systems at reasonable utilisation levels normally, you don't actually need that much additional capacity to handle peak loads.

replies(6): >>40716157 #>>40716287 #>>40716860 #>>40717110 #>>40717333 #>>40719853 #

krisoft ◴[18 Jun 24 12:11 UTC] No.40716860[source]▶

>>40715797 #

> It is very easy to blurt out "well obviously we need twice the processing power!" but if we scale to twice the processing power, then start accepting twice the request rate – we will actually be serving each request in half the time we originally did.

I don't understand what you are saying. Are you talking about the time the request is buffered in some queue on average assuming they are arriving randomly? Or something like that?

Here is what I'm thinking. We are operating a hospital which does only one type of surgery which last an hour exactly. (Presumably it is a veterinary practice for spherical cows.) A fully staffed operating room can operate on 24 spherical cows a day. If due to a spherical cow jamboree we expect more patients and we set up a second operating theatre we will be able to serve 48 of them a day. But we are still serving them for an hour each. (because that is how long the operation takes.)

Even if we are talking about latency when 24 cows show up at the stroke of midnight to the one operating room hospital they each will be served on average in 12.5h. Same average if 48 cows show up to the two operating room hospital.

So what am I thinking wrong here? Is "scale to twice the processing power" not the same as getting a second operating room? I'm not seeing where "we will actually be serving each request in half the time" comes from.

replies(2): >>40716961 #>>40717412 #

NooneAtAll3 ◴[18 Jun 24 13:09 UTC] No.40717412[source]▶

>>40716860 #

I think I finally understood that math

in queue theory, you don't expect "operating rooms" to operate 24 hours per day - spherical patients may have a gap, which causes the room to not work for some time, but then jamboree happens and it averages out

doubling the cows input doesn't mean each "burst" becomes twice as big - some of the new cows can simply fall into periods that previously weren't used

thus the second portion of patients don't need a whole second copy of all operating rooms - part of them get gobbled into inactive timeslots of already existing resources

replies(2): >>40718134 #>>40720078 #