(qwen.ai)

314 points pretext | 1 comments | 10 Dec 25 16:13 UTC | HN request time: 0.204s | source

Show context

sosodev ◴[10 Dec 25 16:55 UTC] No.46220123[source]▶

>>46219538 (OP) #

Does Qwen3-Omni support real-time conversation like GPT-4o? Looking at their documentation it doesn't seem like it does.

Are there any open weight models that do? Not talking about speech to text -> LLM -> text to speech btw I mean a real voice <-> language model.

edit:

It does support real-time conversation! Has anybody here gotten that to work on local hardware? I'm particularly curious if anybody has run it with a non-nvidia setup.

replies(4): >>46220228 #>>46222544 #>>46223129 #>>46224919 #

potatoman22 ◴[10 Dec 25 22:34 UTC] No.46224919[source]▶

>>46220123 #

From what I can tell, their official chat site doesn't have a native audio -> audio model yet. I like to test this through homophones (e.g. record and record) and asking it to change its pitch or produce sounds.

replies(3): >>46225836 #>>46227448 #>>46227486 #

djtango ◴[11 Dec 25 03:53 UTC] No.46227448[source]▶

>>46224919 #

Is record a homophone? At least in the UK we use different pronunciations for the meanings. Re-cord for the verb, rec-ord for the noun.

replies(1): >>46238269 #

1. potatoman22 ◴[11 Dec 25 22:36 UTC] No.46238269[source]▶

>>46227448 #

I was mistaken about what homophone means!

↑

Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model