←back to thread

113 points sethkim | 2 comments | | HN request time: 0.467s | source
Show context
sharkjacobs ◴[] No.44457899[source]
> If you’re building batch tasks with LLMs and are looking to navigate this new cost landscape, feel free to reach out to see how Sutro can help.

I don't have any reason to doubt the reasoning this article is doing or the conclusions it reaches, but it's important to recognize that this article is part of a sales pitch.

replies(2): >>44458078 #>>44458230 #
1. sethkim ◴[] No.44458078[source]
Yes, we're a startup! And LLM inference is a major component of what we do - more importantly, we're working on making these models accessible as analytical processing tools, so we have a strong focus on making them cost-effective at scale.
replies(1): >>44458798 #
2. sharkjacobs ◴[] No.44458798[source]
I see your prices page lists the average cost per million tokens. Is that because you are using the formula you describe, which depends on hardware time and throughput?

> API Price ≈ (Hourly Hardware Cost / Throughput in Tokens per Hour) + Margin