←back to thread

314 points pretext | 1 comments | | HN request time: 0.204s | source
Show context
sosodev ◴[] No.46220123[source]
Does Qwen3-Omni support real-time conversation like GPT-4o? Looking at their documentation it doesn't seem like it does.

Are there any open weight models that do? Not talking about speech to text -> LLM -> text to speech btw I mean a real voice <-> language model.

edit:

It does support real-time conversation! Has anybody here gotten that to work on local hardware? I'm particularly curious if anybody has run it with a non-nvidia setup.

replies(4): >>46220228 #>>46222544 #>>46223129 #>>46224919 #
potatoman22 ◴[] No.46224919[source]
From what I can tell, their official chat site doesn't have a native audio -> audio model yet. I like to test this through homophones (e.g. record and record) and asking it to change its pitch or produce sounds.
replies(3): >>46225836 #>>46227448 #>>46227486 #
djtango ◴[] No.46227448[source]
Is record a homophone? At least in the UK we use different pronunciations for the meanings. Re-cord for the verb, rec-ord for the noun.
replies(1): >>46238269 #
1. potatoman22 ◴[] No.46238269[source]
I was mistaken about what homophone means!