(sutro.sh)

113 points sethkim | 2 comments | 03 Jul 25 17:34 UTC | HN request time: 0.467s | source

Show context

sharkjacobs ◴[03 Jul 25 18:29 UTC] No.44457899[source]▶

> If you’re building batch tasks with LLMs and are looking to navigate this new cost landscape, feel free to reach out to see how Sutro can help.

I don't have any reason to doubt the reasoning this article is doing or the conclusions it reaches, but it's important to recognize that this article is part of a sales pitch.

replies(2): >>44458078 #>>44458230 #

1. sethkim ◴[03 Jul 25 18:50 UTC] No.44458078[source]▶

>>44457899 #

Yes, we're a startup! And LLM inference is a major component of what we do - more importantly, we're working on making these models accessible as analytical processing tools, so we have a strong focus on making them cost-effective at scale.

replies(1): >>44458798 #

2. sharkjacobs ◴[03 Jul 25 20:16 UTC] No.44458798[source]▶

>>44458078 (TP) #

I see your prices page lists the average cost per million tokens. Is that because you are using the formula you describe, which depends on hardware time and throughput?

> API Price ≈ (Hourly Hardware Cost / Throughput in Tokens per Hour) + Margin

↑

The End of Moore's Law for AI? Gemini Flash Offers a Warning