(blog.google)

612 points meetpateltech | 5 comments | 05 Feb 25 16:03 UTC | HN request time: 0.001s | source

1. Ninjinka ◴[05 Feb 25 17:21 UTC] No.42951897[source]▶

>>42950454 (OP) #

Pricing is CRAZY.

Audio input is $0.70 per million tokens on 2.0 Flash, $0.075 for 2.0 Flash-Lite and 1.5 Flash.

For gpt-4o-mini-audio-preview, it's $10 per million tokens of audio input.

replies(2): >>42952141 #>>42952542 #

2. sunaookami ◴[05 Feb 25 17:37 UTC] No.42952141[source]▶

>>42951897 (TP) #

Sadly: "Gemini can only infer responses to English-language speech."

https://ai.google.dev/gemini-api/docs/audio?lang=rest#techni...

replies(1): >>42958141 #

3. KTibow ◴[05 Feb 25 18:01 UTC] No.42952542[source]▶

>>42951897 (TP) #

The increase is likely because 1.5 Flash was actually cheaper than all other STT services. I wrote about this a while ago at https://ktibow.github.io/blog/geminiaudio/.

replies(1): >>42953271 #

4. radeeyate ◴[05 Feb 25 18:49 UTC] No.42953271[source]▶

>>42952542 #

I feel that the audio interpreting aspects of the Gemini models aren't just STT. If you give it something like a song, it can give you information about it.

5. mbrock ◴[06 Feb 25 02:08 UTC] No.42958141[source]▶

>>42952141 #

I don't know what they mean by this but the obvious interpretation is not true. It understands other languages, it even does really well with low representation languages, in my case Latvian.

↑

Gemini 2.0 is now available to everyone