Most active commenters

    ←back to thread

    GPT-5.2

    (openai.com)
    1053 points atgctg | 12 comments | | HN request time: 1.194s | source | bottom
    Show context
    zug_zug ◴[] No.46235131[source]
    For me the last remaining killer feature of ChatGPT is the quality of the voice chat. Do any of the competitors have something like that?
    replies(15): >>46235139 #>>46235151 #>>46235193 #>>46235277 #>>46235779 #>>46236133 #>>46236236 #>>46236283 #>>46236341 #>>46236399 #>>46236665 #>>46236951 #>>46237061 #>>46237082 #>>46237617 #
    1. Robdel12 ◴[] No.46235193[source]
    I have found Claude‘s voice chat to be better. I only recently tried it because I liked ChatGPTs enough, but I think I’m going to use Claude going forward. I find myself getting interrupted by ChatGPT a lot whenever I do use it.
    replies(1): >>46235258 #
    2. lxgr ◴[] No.46235258[source]
    Claude’s voice chat isn’t “native” though, is it? It feels like it’s speech-to-text-to-LLM and back.
    replies(2): >>46235357 #>>46236680 #
    3. sosodev ◴[] No.46235357[source]
    You can test it by asking it to: change the pitch of its voice, make specific sounds (like laughter), differentiate between words that are spelled the same but pronounced differently (record and record), etc.
    replies(2): >>46235438 #>>46236201 #
    4. lxgr ◴[] No.46235438{3}[source]
    Good idea, but an external “bolted on” LLM-based TTS would still pass that in many cases, right?
    replies(3): >>46235639 #>>46235768 #>>46235803 #
    5. barrkel ◴[] No.46235639{4}[source]
    The model giving it text to speak would have to annotate the text in order for the TTS to add the affect. The TTS wouldn't "remember" such instructions from a speech to text stage previously.
    6. sosodev ◴[] No.46235768{4}[source]
    Yes, a sufficiently advanced marrying of TTS and LLM could pass a lot of these tests. That kind of blurs the line between native voice model and not though.

    You would need:

    * A STT (ASR) model that outputs phonetics not just words

    * An LLM fine-tuned to understand that and also output the proper tokens for prosody control, non-speech vocalizations, etc

    * A TTS model that understands those tokens and properly generate the matching voice

    At that point I would probably argue that you've created a native voice model even if it's still less nuanced than the proper voice to voice of something like 4o. The latency would likely be quite high though. I'm pretty sure I've seen a couple of open source projects that have done this type of setup but I've not tried testing them.

    7. jablongo ◴[] No.46235803{4}[source]
    I tried to make ChatGPT sing Mary had a little lamb recently and it's atonal but vaguely resembles the melody, which is interesting.
    8. ◴[] No.46236201{3}[source]
    9. causalmodels ◴[] No.46236680[source]
    I just asked it and it said that it uses the on device TTS capabilities.
    replies(1): >>46237191 #
    10. furyofantares ◴[] No.46237191{3}[source]
    I find it very unlikely that it would be trained on that information or that anthropic would put that in its context window, so it's very likely that it just made that answer up.
    replies(1): >>46237402 #
    11. causalmodels ◴[] No.46237402{4}[source]
    No, it did not make it up. I was curious so I asked it asked it to imitate a posh British accent imitating a South Brooklyn accent while having a head cold and it explained that it didn't have have fine grained control over the audio output because it was using a TTS. I asked it how it knew that and it pointed me towards [1] and highlighted the following.

    > As of May 29th, 2025, we have added ElevenLabs, which supports text to speech functionality in Claude for Work mobile apps.

    Tracked down the original source [2] and looked for additional updates but couldn't find anything.

    [1] https://simonwillison.net/2025/May/31/using-voice-mode-on-cl...

    [2] https://trust.anthropic.com/updates

    replies(1): >>46237525 #
    12. furyofantares ◴[] No.46237525{5}[source]
    If it does a web search that's fine, I assumed it hadn't since you hadn't linked to anything.

    Also it being right doesn't mean it didn't just make up the answer.