4 points dav43 | 2 comments | | HN request time: 0.43s | source

I have heard this mentioned, online and in casual conversation

** During business hours these endpoints may get so highly utilised that accuracy and quality decline (ignore latency) (effective available token capacity is reduced etc?**

Has anyone run studies/experiments to show (true or false) - across a large dataset - that the performance of these endpoints during peak usage hours vs, low usage hours changes or does not change (I assume some statistical significance test required).

I don't have the knowledge to answer this question or understand if it's a valid hypothesis. Anyone got resources on this? I could only find tangential mentions in these papers.

[0] https://arxiv.org/html/2507.18007v1 "These challenges contribute to bottlenecks during peak workloads, ultimately affecting inference service quality, scalability, and responsiveness, which requires accurate resource profiling for LLM inference task."

[1] Asking Gemini - https://share.google/aimode/bWb5w9dpf2ZeggqSj

1. cranberryturkey ◴[] No.45048800[source]
I've wondered this too actually.