I don't have any reason to doubt the reasoning this article is doing or the conclusions it reaches, but it's important to recognize that this article is part of a sales pitch.
I don't have any reason to doubt the reasoning this article is doing or the conclusions it reaches, but it's important to recognize that this article is part of a sales pitch.
But the thrust of the article is that contrary to conventional wisdom, we shouldn't expect llm models to continue getting more efficient, and so its worthwhile to explore other options for cost savings in inference, such as batch processing.
The conclusion they reach is one which directly serves what they're selling.
I'll repeat; I'm not disputing anything in this article. I'm really not, I'm not even trying to be coy and make allusions without directly saying anything. If I thought this was bullshit I'm not afraid to semi-anonymously post a comment saying so.
But this is advertising, just like Backblaze's hard drive reliability blog posts are advertising.
> API Price ≈ (Hourly Hardware Cost / Throughput in Tokens per Hour) + Margin