Batch Mode in the Gemini API: Process More for Less

(developers.googleblog.com)

167 points xnx | 1 comments | 07 Jul 25 16:30 UTC | HN request time: 0.223s | source

Show context

dsjoerg ◴[11 Jul 25 02:11 UTC] No.44527801[source]▶

>>44492014 (OP) #

We used the previous version of this batch mode, which went through BigQuery. It didn't work well for us at the time because we were in development mode and we needed faster cycle time to iterate and learn. Sometimes the response would come back much faster than 24 hours, but sometimes not. There was no visibility offered into what response time you would get; just submit and wait.

You have to be pretty darn sure that your job is going to do exactly what you want to be able to wait 24 hours for a response. It's like going back to the punched-card era. If I could get even 1% of the batch in a quicker response and then the rest more slowly, that would have made a big difference.

replies(4): >>44527819 #>>44528277 #>>44528385 #>>44530651 #

cpard ◴[11 Jul 25 02:14 UTC] No.44527819[source]▶

>>44527801 #

It seems that the 24h SLA is standard for batch inference among the vendors and I wonder how useful it can be when you have no visibility on when the job will be delivered.

I wonder why they do that and who is actually getting value out of these batch APIs.

Thanks for sharing your experience!

replies(5): >>44527850 #>>44527911 #>>44528102 #>>44528329 #>>44530652 #

1. dist-epoch ◴[11 Jul 25 10:48 UTC] No.44530652[source]▶

>>44527819 #

> you have no visibility on when the job will be delivered

You do have - within 24 hours. So don't submit requests you need in 10 hours.

↑