Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model

(qwen.ai)

314 points pretext | 4 comments | 10 Dec 25 16:13 UTC | HN request time: 0.683s | source

Show context

sosodev ◴[10 Dec 25 16:55 UTC] No.46220123[source]▶

>>46219538 (OP) #

Does Qwen3-Omni support real-time conversation like GPT-4o? Looking at their documentation it doesn't seem like it does.

Are there any open weight models that do? Not talking about speech to text -> LLM -> text to speech btw I mean a real voice <-> language model.

edit:

It does support real-time conversation! Has anybody here gotten that to work on local hardware? I'm particularly curious if anybody has run it with a non-nvidia setup.

replies(4): >>46220228 #>>46222544 #>>46223129 #>>46224919 #

potatoman22 ◴[10 Dec 25 22:34 UTC] No.46224919[source]▶

>>46220123 #

From what I can tell, their official chat site doesn't have a native audio -> audio model yet. I like to test this through homophones (e.g. record and record) and asking it to change its pitch or produce sounds.

replies(3): >>46225836 #>>46227448 #>>46227486 #

1. dragonwriter ◴[11 Dec 25 03:59 UTC] No.46227486[source]▶

>>46224919 #

“record and record”, if you mean the verb for persisting something and the noun for the thing persisted, are heteronyms (homographs which are not homophones), which incidentally is also what you would probably want to test what you are talking about here (distinguishing homophones would test use of context to understand meaning, but wouldn’t test anything about whether or not logic was working directly on audio or only working on text processed from audio, failing to distinguish heteronyms is suggestive of processing occurring on text, not audio directly.)

replies(2): >>46227622 #>>46238285 #

2. bakeman ◴[11 Dec 25 04:21 UTC] No.46227622[source]▶

>>46227486 (TP) #

There are homophones of “record”, such as:

“He’s on record saying he broke the record for spinning a record.”

replies(1): >>46227911 #

3. dragonwriter ◴[11 Dec 25 05:13 UTC] No.46227911[source]▶

>>46227622 #

True.

OTOH my point that the thing being suggested to be tested is not testable by seeing whether or not the system is capable of distinguishing homophones, but might be by seeing whether or not it distingishes heteronyms still stands. (The speculation that the record/record distinction intended was one that is actually a pair of heteronyms and that the error was merely the use of the word “homophone" in place of “heteronym”, rather than the basic logic of the comment is somewhat tangential to the main point.)

4. potatoman22 ◴[11 Dec 25 22:37 UTC] No.46238285[source]▶

>>46227486 (TP) #

Ah I meant heteronyms. Thanks!

↑