Batch Mode in the Gemini API: Process More for Less

(developers.googleblog.com)

167 points xnx | 2 comments | 07 Jul 25 16:30 UTC | HN request time: 0.579s | source

Show context

dsjoerg ◴[11 Jul 25 02:11 UTC] No.44527801[source]▶

>>44492014 (OP) #

We used the previous version of this batch mode, which went through BigQuery. It didn't work well for us at the time because we were in development mode and we needed faster cycle time to iterate and learn. Sometimes the response would come back much faster than 24 hours, but sometimes not. There was no visibility offered into what response time you would get; just submit and wait.

You have to be pretty darn sure that your job is going to do exactly what you want to be able to wait 24 hours for a response. It's like going back to the punched-card era. If I could get even 1% of the batch in a quicker response and then the rest more slowly, that would have made a big difference.

replies(4): >>44527819 #>>44528277 #>>44528385 #>>44530651 #

cpard ◴[11 Jul 25 02:14 UTC] No.44527819[source]▶

>>44527801 #

It seems that the 24h SLA is standard for batch inference among the vendors and I wonder how useful it can be when you have no visibility on when the job will be delivered.

I wonder why they do that and who is actually getting value out of these batch APIs.

Thanks for sharing your experience!

replies(5): >>44527850 #>>44527911 #>>44528102 #>>44528329 #>>44530652 #

1. 3eb7988a1663 ◴[11 Jul 25 02:32 UTC] No.44527911[source]▶

>>44527819 #

Think of it like you have a large queue of work to be done (eg summarize N decades of historical documents). There is little urgency to the outcome because the bolus is so large. You just want to maintain steady progress on the backlog where cost optimization is more important than timing.

replies(1): >>44528950 #

2. cpard ◴[11 Jul 25 06:28 UTC] No.44528950[source]▶

>>44527911 (TP) #

yes, what you describe feels like a one off job that you want to run, which is big and also not time critical.

Here's an example:

If you are a TV broadcaster and you want to summarize and annotate the content generated in the past 12 hours you most probably need to have access to the summaries of the previous 12 hours too.

Now if you submit a batch job for the first 12 hours of content, you might end up in a situation where you want to process the next batch but the previous one is not delivered yet.

And imo that's fine as long as you somehow know that it will take more than 12h to complete but it might be delivered to you in 1h or in 23h.

That's the part of the these batch APIs that I find hard to understand how you use in a production environment outside of one off jobs.

↑