(developers.googleblog.com)

167 points xnx | 1 comments | 07 Jul 25 16:30 UTC | HN request time: 0.621s | source

Show context

tripplyons ◴[11 Jul 25 01:45 UTC] No.44527652[source]▶

>>44492014 (OP) #

For those who aren't aware, OpenAI has a very similar batch mode (50% discount if you wait up to 24 hours): https://platform.openai.com/docs/api-reference/batch

It's nice to see competition in this space. AI is getting cheaper and cheaper!

replies(4): >>44528108 #>>44528444 #>>44528451 #>>44532342 #

fantispug ◴[11 Jul 25 03:16 UTC] No.44528108[source]▶

>>44527652 #

Yes, this seems to be a common capability - Anthropic and Mistral have something very similar as do resellers like AWS Bedrock.

I guess it lets them better utilise their hardware in quiet times throughout the day. It's interesting they all picked 50% discount.

replies(3): >>44528237 #>>44529423 #>>44532883 #

1. calaphos ◴[11 Jul 25 07:51 UTC] No.44529423[source]▶

>>44528108 #

Inference throughout scales really well with larger batch sizes (at the cost of latency) due to rising arithmetic intensity and the fact that it's almost always memory BW limited.

↑

Batch Mode in the Gemini API: Process More for Less