Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model

(qwen.ai)

314 points pretext | 1 comments | 10 Dec 25 16:13 UTC | HN request time: 0s | source

Show context

sosodev ◴[10 Dec 25 16:55 UTC] No.46220123[source]▶

>>46219538 (OP) #

Does Qwen3-Omni support real-time conversation like GPT-4o? Looking at their documentation it doesn't seem like it does.

Are there any open weight models that do? Not talking about speech to text -> LLM -> text to speech btw I mean a real voice <-> language model.

edit:

It does support real-time conversation! Has anybody here gotten that to work on local hardware? I'm particularly curious if anybody has run it with a non-nvidia setup.

replies(4): >>46220228 #>>46222544 #>>46223129 #>>46224919 #

ivape ◴[10 Dec 25 20:16 UTC] No.46223129[source]▶

>>46220123 #

That's exciting. I doubt there are any polished voice chat local apps yet that you can easily plug this into (I doubt the user experience is "there" yet). Even stuff like Silly Tavern is near unusable, lots of work to be done on the local front. Local voice models are what's going to enable that whole Minority Report workflow soon enough (especially if commands and intent are determined at the local level, and the meat of the prompt is handled by a larger remote model).

This is part of programming that I think is the new field. There will be tons of work for those that can build the new workflows which will need to be primarily natural language driven.

replies(1): >>46223285 #

1. sosodev ◴[10 Dec 25 20:27 UTC] No.46223285[source]▶

>>46223129 #

I did find this app: https://github.com/gabber-dev/gabber

The creator posted a little demo of it working with Qwen3 Omni that is quite impressive: https://www.youtube.com/watch?v=5DBFVe3cLto

He didn't include any details regarding how the model was running though

↑