←back to thread

257 points amrrs | 1 comments | | HN request time: 0s | source
Show context
Mizza ◴[] No.41842147[source]
What's SOTA for open source or on-device right now?

I tried building a babelfish with o1, but the transcription in languages other than English are useless. When it gets it correct, the translations are pretty perfect and the voice responses are super fast, but without good transcription it's kind of useless. So close!

replies(5): >>41842153 #>>41842200 #>>41842281 #>>41843179 #>>41846783 #
refulgentis ◴[] No.41842200[source]
I'm not sure what you mean fully, this is TTS, but it sounds like you're expecting an answer about transcription

So its both hard to know what category you'd like to hear about, as well as if you do mean transcription, what your baseline is.

Whisper is widely regarded the best in the free camp, but I wouldn't be surprised to see a paper of a model claiming better WER, or a much bigger model.

If you meant you tried realtime 4o from OpenAI, and not o1*, it uses whisper for transcription on server, so I don't think you'll see much gain from trying whisper. my next try would be the Google Cloud APIs, but they're paid and with regard to your question re: open source SOTA, the underlying model isn't open.

But also if you did mean 4o, the transcription shouldn't matter for output transcription quality, the model is taking in voice (I verified their claim by noticing when there's errors in the transcription, it answers correctly)

* I keep messing these two up when talking about it, and it seems unlikely you meant o1 because it has a long synchronous delay before any part of the answer is available, and doesn't take in audio.

If you did mean o1, then, I'd use realtime 4o for TTS, and have it natively do the translation, as it will be unaffected by errors in transcription like you're facing now

replies(1): >>41846326 #
krageon ◴[] No.41846326[source]
GP said local / on-device. Most of what you mentioned is cloud shit.
replies(1): >>41850855 #
1. refulgentis ◴[] No.41850855{3}[source]
Yeah I covered on device. Okay, lets call the rest cloud shit. Yeah, like I said, confusing comment. They said open source and on device and talked about the quality issues with the cloud shit they're using that certainly won't be resolved by using on device models. shrug