Most active commenters

Popular/hot comments

(openai.com)

https://platform.openai.com/docs/guides/latest-model

System card: https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...

Show context

zug_zug ◴[11 Dec 25 18:34 UTC] No.46235131[source]▶

For me the last remaining killer feature of ChatGPT is the quality of the voice chat. Do any of the competitors have something like that?

replies(15): >>46235139 #>>46235151 #>>46235193 #>>46235277 #>>46235779 #>>46236133 #>>46236236 #>>46236283 #>>46236341 #>>46236399 #>>46236665 #>>46236951 #>>46237061 #>>46237082 #>>46237617 #

1. Robdel12 ◴[11 Dec 25 18:38 UTC] No.46235193[source]▶

>>46235131 #

I have found Claude‘s voice chat to be better. I only recently tried it because I liked ChatGPTs enough, but I think I’m going to use Claude going forward. I find myself getting interrupted by ChatGPT a lot whenever I do use it.

replies(1): >>46235258 #

2. lxgr ◴[11 Dec 25 18:43 UTC] No.46235258[source]▶

>>46235193 (TP) #

Claude’s voice chat isn’t “native” though, is it? It feels like it’s speech-to-text-to-LLM and back.

replies(2): >>46235357 #>>46236680 #

3. sosodev ◴[11 Dec 25 18:48 UTC] No.46235357[source]▶

>>46235258 #

You can test it by asking it to: change the pitch of its voice, make specific sounds (like laughter), differentiate between words that are spelled the same but pronounced differently (record and record), etc.

replies(2): >>46235438 #>>46236201 #

4. lxgr ◴[11 Dec 25 18:53 UTC] No.46235438{3}[source]▶

>>46235357 #

Good idea, but an external “bolted on” LLM-based TTS would still pass that in many cases, right?

replies(3): >>46235639 #>>46235768 #>>46235803 #

5. barrkel ◴[11 Dec 25 19:04 UTC] No.46235639{4}[source]▶

>>46235438 #

The model giving it text to speak would have to annotate the text in order for the TTS to add the affect. The TTS wouldn't "remember" such instructions from a speech to text stage previously.

6. sosodev ◴[11 Dec 25 19:13 UTC] No.46235768{4}[source]▶

>>46235438 #

Yes, a sufficiently advanced marrying of TTS and LLM could pass a lot of these tests. That kind of blurs the line between native voice model and not though.

You would need:

* A STT (ASR) model that outputs phonetics not just words

* An LLM fine-tuned to understand that and also output the proper tokens for prosody control, non-speech vocalizations, etc

* A TTS model that understands those tokens and properly generate the matching voice

At that point I would probably argue that you've created a native voice model even if it's still less nuanced than the proper voice to voice of something like 4o. The latency would likely be quite high though. I'm pretty sure I've seen a couple of open source projects that have done this type of setup but I've not tried testing them.

7. jablongo ◴[11 Dec 25 19:15 UTC] No.46235803{4}[source]▶

>>46235438 #

I tried to make ChatGPT sing Mary had a little lamb recently and it's atonal but vaguely resembles the melody, which is interesting.

8. ◴[11 Dec 25 19:46 UTC] No.46236201{3}[source]▶

>>46235357 #

9. causalmodels ◴[11 Dec 25 20:27 UTC] No.46236680[source]▶

>>46235258 #

I just asked it and it said that it uses the on device TTS capabilities.

replies(1): >>46237191 #

10. furyofantares ◴[11 Dec 25 21:08 UTC] No.46237191{3}[source]▶

>>46236680 #

I find it very unlikely that it would be trained on that information or that anthropic would put that in its context window, so it's very likely that it just made that answer up.

replies(1): >>46237402 #

11. causalmodels ◴[11 Dec 25 21:28 UTC] No.46237402{4}[source]▶

>>46237191 #

No, it did not make it up. I was curious so I asked it asked it to imitate a posh British accent imitating a South Brooklyn accent while having a head cold and it explained that it didn't have have fine grained control over the audio output because it was using a TTS. I asked it how it knew that and it pointed me towards [1] and highlighted the following.

> As of May 29th, 2025, we have added ElevenLabs, which supports text to speech functionality in Claude for Work mobile apps.

Tracked down the original source [2] and looked for additional updates but couldn't find anything.

[1] https://simonwillison.net/2025/May/31/using-voice-mode-on-cl...

[2] https://trust.anthropic.com/updates

replies(1): >>46237525 #

12. furyofantares ◴[11 Dec 25 21:37 UTC] No.46237525{5}[source]▶

>>46237402 #

If it does a web search that's fine, I assumed it hadn't since you hadn't linked to anything.

Also it being right doesn't mean it didn't just make up the answer.

↑

GPT-5.2