←back to thread

GPT-5.2

(openai.com)
1019 points atgctg | 1 comments | | HN request time: 0.387s | source
Show context
zug_zug ◴[] No.46235131[source]
For me the last remaining killer feature of ChatGPT is the quality of the voice chat. Do any of the competitors have something like that?
replies(15): >>46235139 #>>46235151 #>>46235193 #>>46235277 #>>46235779 #>>46236133 #>>46236236 #>>46236283 #>>46236341 #>>46236399 #>>46236665 #>>46236951 #>>46237061 #>>46237082 #>>46237617 #
FrasiertheLion ◴[] No.46235139[source]
Try elevenlabs
replies(1): >>46235296 #
sosodev ◴[] No.46235296[source]
Does elevenlabs have a real-time conversational voice model? It seems like like their focus is largely on text to speech and speech to text. Which can approximate that type of thing but it's not at all the same as the native voice to voice that 4o does.
replies(2): >>46235524 #>>46236377 #
1. hi_im_vijay ◴[] No.46236377[source]
[disclaimer, i work at elevenlabs] we specifically went with a cascading model for our agents platform because it's better suited for enterprise use cases where they have full control over the brain and can bring their own llm. with that said, even with a cascading model, we can capture a decent amount of nuance with our asr model, and it also supports capturing audio events like laughter or coughing.

a true speech to speech conversational model will perform better on things like capturing tone, pronouncations, phonetics, etc, but i do believe we'll also get better at that on the asr side over time.

replies(1): >>46239343 #