←back to thread

257 points amrrs | 1 comments | | HN request time: 0.221s | source
Show context
Mizza ◴[] No.41842147[source]
What's SOTA for open source or on-device right now?

I tried building a babelfish with o1, but the transcription in languages other than English are useless. When it gets it correct, the translations are pretty perfect and the voice responses are super fast, but without good transcription it's kind of useless. So close!

replies(5): >>41842153 #>>41842200 #>>41842281 #>>41843179 #>>41846783 #
kabirgoel ◴[] No.41843179[source]
I work at Cartesia, which operates a TTS API similar to Play [1]. I’d be willing to venture a guess and say that our TTS model, Sonic, is probably SoTA for on-device, but don't quote me on that claim. It's the same model that powers our API.

Sonic can be run on a MacBook Pro. Our API sounds better, of course, since that's running the model on GPUs without any special tricks like quantization. But subjectively the on-device version is good quality and real-time, and it possesses all the capabilities of the larger model, such as voice cloning.

Our co-founders did a demo of the on-device capabilities on the No Priors podcast [2], if you're interested in checking it out for yourself. (I will caveat that this sounds quite a bit worse than if you heard it in person today, since this was an early alpha + it's a recording of the output from a MacBook Pro speaker.)

[1] https://cartesia.ai/sonic [2] https://youtu.be/neQbqOhp8w0?si=2n1i432r5fDG2tPO&t=1886

replies(1): >>41862255 #
1. pietz ◴[] No.41862255[source]
Is your model really open source or did you misunderstand the question?