Mercury: Ultra-fast language models based on diffusion

(arxiv.org)

566 points PaulHoule | 5 comments | 07 Jul 25 12:31 UTC | HN request time: 1.382s | source

Show context

JimDabell ◴[07 Jul 25 13:53 UTC] No.44490396[source]▶

>>44489690 (OP) #

Pricing:

US$0.000001 per output token ($1/M tokens)

US$0.00000025 per input token ($0.25/M tokens)

https://platform.inceptionlabs.ai/docs#models

replies(1): >>44490656 #

asaddhamani ◴[07 Jul 25 14:20 UTC] No.44490656[source]▶

>>44490396 #

The pricing is a little on the higher side. Working on a performance-sensitive application, I tried Mercury and Groq (Llama 3.1 8b, Llama 4 Scout) and the performance was neck-and-neck but the pricing was way better for Groq.

But I'll be following diffusion models closely, and I hope we get some good open source ones soon. Excited about their potential.

replies(1): >>44492609 #

1. tripplyons ◴[07 Jul 25 17:25 UTC] No.44492609[source]▶

>>44490656 #

Good to know. I didn't realize how good the pricing is on Groq!

replies(2): >>44493536 #>>44497444 #

2. tlack ◴[07 Jul 25 18:55 UTC] No.44493536[source]▶

>>44492609 (TP) #

If your application is pricing sensitive, check out DeepInfra.com - they have a variety of models in the pennies-per-mil range. Not quite as fast as Mercury, Groq or Samba Nova though.

(I have no affiliation with this company aside from being a happy customer the last few years)

replies(1): >>44498128 #

3. sexeriy237 ◴[08 Jul 25 05:48 UTC] No.44497444[source]▶

>>44492609 (TP) #

You're getting the savings by shifting the pollution of the datacenter onto a largely black community and choking them out.

replies(1): >>44498135 #

4. asaddhamani ◴[08 Jul 25 08:21 UTC] No.44498128[source]▶

>>44493536 #

DeepInfra is amazing in terms of price, like really, they have the Qwen3 embedding model for $0.002 per mn tokens. That's an order of magnitude cheaper than most alternatives with better benchmark scores. But the performance P99 is slow and the variance is huge. For latency sensitive workloads it's problematic, if they can fix that it'll be a no-brainer to use them. DeepInfra does tend to have the lowest prices of any API provider.

5. JimDabell ◴[08 Jul 25 08:23 UTC] No.44498135[source]▶

>>44497444 #

Are you confusing the AI company Groq with xAI, Elon Musk’s AI company that has a model called Grok?

↑