Batch Mode in the Gemini API: Process More for Less

(developers.googleblog.com)

167 points xnx | 1 comments | 07 Jul 25 16:30 UTC | HN request time: 0.209s | source

Show context

segalord ◴[11 Jul 25 07:54 UTC] No.44529444[source]▶

>>44492014 (OP) #

Man googles offerings are so inconsistent, batch processing has been available on vertex for a while now, I dont really get why they have two different offering in vertex and gemini, both are equally inaccessible

replies(2): >>44530311 #>>44530805 #

rockwotj ◴[11 Jul 25 11:10 UTC] No.44530805[source]▶

>>44529444 #

It’s because vertex is the “entrrprise” offering that is hippa compliant, etc. That is why vertex only has explicit prompt caching and not implicit, etc. Vertex usage is never used for training or model feedback, but the gemini API does. Basically the Gemini API is Google’s way of being able to move faster like openai and the other foundational model providers, but still having an enterprise offering. Go check Anthropic’s documentation, they even say if you have enterprise or regulatory needs go use bedrock or vertex.

replies(1): >>44532647 #

1. Deathmax ◴[11 Jul 25 14:39 UTC] No.44532647[source]▶

>>44530805 #

Vertex's offering of Gemini very much does implicit caching, and has always been the case [1]. The recent addition of applying implicit cache hit discounts also works on Vertex, as long as you don't use the `global` endpoint and hit one of the regional endpoints.

[1]: http://web.archive.org/web/20240517173258/https://cloud.goog..., "By default Google caches a customer's inputs and outputs for Gemini models to accelerate responses to subsequent prompts from the customer. Cached contents are stored for up to 24 hours."

↑