Batch Mode in the Gemini API: Process More for Less

(developers.googleblog.com)

167 points xnx | 2 comments | 07 Jul 25 16:30 UTC | HN request time: 0.429s | source

Show context

dsjoerg ◴[11 Jul 25 02:11 UTC] No.44527801[source]▶

>>44492014 (OP) #

We used the previous version of this batch mode, which went through BigQuery. It didn't work well for us at the time because we were in development mode and we needed faster cycle time to iterate and learn. Sometimes the response would come back much faster than 24 hours, but sometimes not. There was no visibility offered into what response time you would get; just submit and wait.

You have to be pretty darn sure that your job is going to do exactly what you want to be able to wait 24 hours for a response. It's like going back to the punched-card era. If I could get even 1% of the batch in a quicker response and then the rest more slowly, that would have made a big difference.

replies(4): >>44527819 #>>44528277 #>>44528385 #>>44530651 #

cpard ◴[11 Jul 25 02:14 UTC] No.44527819[source]▶

>>44527801 #

It seems that the 24h SLA is standard for batch inference among the vendors and I wonder how useful it can be when you have no visibility on when the job will be delivered.

I wonder why they do that and who is actually getting value out of these batch APIs.

Thanks for sharing your experience!

replies(5): >>44527850 #>>44527911 #>>44528102 #>>44528329 #>>44530652 #

1. vineyardmike ◴[11 Jul 25 02:20 UTC] No.44527850[source]▶

>>44527819 #

It’s like most batch processes, it’s not useful if you don’t know what the response will be and you’re iterating interactively. It for data pipelines, analytics workloads, etc, you can handle that delay because no one is waiting on the response.

I’m a developer working on a product that lets users upload content. This upload is not time sensitive. We pass the content through a review pipeline, where we did moderation and analysis, and some business-specific checks that the user uploaded relevant content. We’re migrating some of that to an LLM based approach because (in testing) the results are just as good, and tweaking a prompt is easier than updating code. We’ll probably use a batch API for this and accept that content can take 24 hours to be audited.

replies(1): >>44528900 #

2. cpard ◴[11 Jul 25 06:19 UTC] No.44528900[source]▶

>>44527850 (TP) #

yeah I get that part of batch, but even with batch processing, you usually want to have some kind of sense of when the data will be done. Especially when downstream processes depend on that.

The other part that I think makes batch LLM inference unique, is that the results are not deterministic. That's where I think what the parent was saying about some of the data at least should be available earlier even if the rest will be available in 24h.

↑