The End of Moore's Law for AI? Gemini Flash Offers a Warning

1. sharkjacobs ◴[03 Jul 25 18:29 UTC] No.44457899[source]▶

> If you’re building batch tasks with LLMs and are looking to navigate this new cost landscape, feel free to reach out to see how Sutro can help.

I don't have any reason to doubt the reasoning this article is doing or the conclusions it reaches, but it's important to recognize that this article is part of a sales pitch.

replies(2): >>44458078 #>>44458230 #

2. sethkim ◴[03 Jul 25 18:50 UTC] No.44458078[source]▶

>>44457899 (TP) #

Yes, we're a startup! And LLM inference is a major component of what we do - more importantly, we're working on making these models accessible as analytical processing tools, so we have a strong focus on making them cost-effective at scale.

replies(1): >>44458798 #

3. samtheprogram ◴[03 Jul 25 19:07 UTC] No.44458230[source]▶

>>44457899 (TP) #

There’s absolutely nothing wrong with putting a small plug at the end of an article.

replies(1): >>44458668 #

4. sharkjacobs ◴[03 Jul 25 20:00 UTC] No.44458668[source]▶

>>44458230 #

Of course not.

But the thrust of the article is that contrary to conventional wisdom, we shouldn't expect llm models to continue getting more efficient, and so its worthwhile to explore other options for cost savings in inference, such as batch processing.

The conclusion they reach is one which directly serves what they're selling.

I'll repeat; I'm not disputing anything in this article. I'm really not, I'm not even trying to be coy and make allusions without directly saying anything. If I thought this was bullshit I'm not afraid to semi-anonymously post a comment saying so.

But this is advertising, just like Backblaze's hard drive reliability blog posts are advertising.

5. sharkjacobs ◴[03 Jul 25 20:16 UTC] No.44458798[source]▶

>>44458078 #

I see your prices page lists the average cost per million tokens. Is that because you are using the formula you describe, which depends on hardware time and throughput?

> API Price ≈ (Hourly Hardware Cost / Throughput in Tokens per Hour) + Margin