←back to thread

113 points sethkim | 5 comments | | HN request time: 0.795s | source
1. sharkjacobs ◴[] No.44457899[source]
> If you’re building batch tasks with LLMs and are looking to navigate this new cost landscape, feel free to reach out to see how Sutro can help.

I don't have any reason to doubt the reasoning this article is doing or the conclusions it reaches, but it's important to recognize that this article is part of a sales pitch.

replies(2): >>44458078 #>>44458230 #
2. sethkim ◴[] No.44458078[source]
Yes, we're a startup! And LLM inference is a major component of what we do - more importantly, we're working on making these models accessible as analytical processing tools, so we have a strong focus on making them cost-effective at scale.
replies(1): >>44458798 #
3. samtheprogram ◴[] No.44458230[source]
There’s absolutely nothing wrong with putting a small plug at the end of an article.
replies(1): >>44458668 #
4. sharkjacobs ◴[] No.44458668[source]
Of course not.

But the thrust of the article is that contrary to conventional wisdom, we shouldn't expect llm models to continue getting more efficient, and so its worthwhile to explore other options for cost savings in inference, such as batch processing.

The conclusion they reach is one which directly serves what they're selling.

I'll repeat; I'm not disputing anything in this article. I'm really not, I'm not even trying to be coy and make allusions without directly saying anything. If I thought this was bullshit I'm not afraid to semi-anonymously post a comment saying so.

But this is advertising, just like Backblaze's hard drive reliability blog posts are advertising.

5. sharkjacobs ◴[] No.44458798[source]
I see your prices page lists the average cost per million tokens. Is that because you are using the formula you describe, which depends on hardware time and throughput?

> API Price ≈ (Hourly Hardware Cost / Throughput in Tokens per Hour) + Margin