Batch Mode in the Gemini API: Process More for Less

1. tripplyons ◴[11 Jul 25 01:45 UTC] No.44527652[source]▶

>>44492014 (OP) #

For those who aren't aware, OpenAI has a very similar batch mode (50% discount if you wait up to 24 hours): https://platform.openai.com/docs/api-reference/batch

It's nice to see competition in this space. AI is getting cheaper and cheaper!

replies(4): >>44528108 #>>44528444 #>>44528451 #>>44532342 #

2. fantispug ◴[11 Jul 25 03:16 UTC] No.44528108[source]▶

>>44527652 (TP) #

Yes, this seems to be a common capability - Anthropic and Mistral have something very similar as do resellers like AWS Bedrock.

I guess it lets them better utilise their hardware in quiet times throughout the day. It's interesting they all picked 50% discount.

replies(3): >>44528237 #>>44529423 #>>44532883 #

3. qrian ◴[11 Jul 25 03:50 UTC] No.44528237[source]▶

>>44528108 #

Bedrock has a batch mode but only for claude 3.5 which is like one year old, which isn't very useful.

4. bayesianbot ◴[11 Jul 25 04:38 UTC] No.44528444[source]▶

>>44527652 (TP) #

DeepSeek has gone a bit different route - they give automatic 75% discount between UTC 16:30-00:30

https://api-docs.deepseek.com/quick_start/pricing

5. dlvhdr ◴[11 Jul 25 04:39 UTC] No.44528451[source]▶

>>44527652 (TP) #

The latest price increases beg to differ

replies(2): >>44529179 #>>44530641 #

6. dmos62 ◴[11 Jul 25 07:09 UTC] No.44529179[source]▶

>>44528451 #

What price increases?

replies(1): >>44529317 #

7. rvnx ◴[11 Jul 25 07:33 UTC] No.44529317{3}[source]▶

>>44529179 #

I guess the Gemini price increase

replies(1): >>44531095 #

8. calaphos ◴[11 Jul 25 07:51 UTC] No.44529423[source]▶

>>44528108 #

Inference throughout scales really well with larger batch sizes (at the cost of latency) due to rising arithmetic intensity and the fact that it's almost always memory BW limited.

9. dist-epoch ◴[11 Jul 25 10:45 UTC] No.44530641[source]▶

>>44528451 #

Only because Flash was mispriced to start with. It was set too cheap compared with its capabilities. They didn't raise the price of Pro.

10. dmos62 ◴[11 Jul 25 11:53 UTC] No.44531095{4}[source]▶

>>44529317 #

Ah, 2.5 flash non-thinking price was increased to match the price of 2.5 flash thinking.

replies(1): >>44532762 #

11. laborcontract ◴[11 Jul 25 14:08 UTC] No.44532342[source]▶

>>44527652 (TP) #

One open secret is that batch mode generations often take much less than 24 hours. I've done a lot of generations where I get my results within 5ish minutes.

replies(1): >>44537409 #

12. Workaccount2 ◴[11 Jul 25 14:48 UTC] No.44532762{5}[source]▶

>>44531095 #

No, 2.5 flash non-thinking was replaced with 2.5 flash lite, and 2.5 flash thinking had it's cost rebalanced (input price increased/output price decreased)

2.5 flash non-thinking doesn't exist anymore. People call it a price increase but it's just confusion about what Google did.

replies(1): >>44536758 #

13. briangriffinfan ◴[11 Jul 25 14:58 UTC] No.44532883[source]▶

>>44528108 #

50% is my personal threshold for a discount going from not worth it to worth it.

14. sunaookami ◴[11 Jul 25 21:03 UTC] No.44536758{6}[source]▶

>>44532762 #

They try to frame it as such but 2.5 Flash Lite is not the same as 2.5 Flash without thinking. It's worse.

15. ridgewell ◴[11 Jul 25 22:28 UTC] No.44537409[source]▶

>>44532342 #

It can depend a lot on the shape of your batch to my understanding. A small batch job can be tasked out a lot quicker than a large batch job waiting for just the right moment where capacity fits.