(qwen.ai)

314 points pretext | 1 comments | 10 Dec 25 16:13 UTC | HN request time: 0.206s | source

Show context

sosodev ◴[10 Dec 25 16:55 UTC] No.46220123[source]▶

>>46219538 (OP) #

Does Qwen3-Omni support real-time conversation like GPT-4o? Looking at their documentation it doesn't seem like it does.

Are there any open weight models that do? Not talking about speech to text -> LLM -> text to speech btw I mean a real voice <-> language model.

edit:

It does support real-time conversation! Has anybody here gotten that to work on local hardware? I'm particularly curious if anybody has run it with a non-nvidia setup.

replies(4): >>46220228 #>>46222544 #>>46223129 #>>46224919 #

red2awn ◴[10 Dec 25 19:38 UTC] No.46222544[source]▶

>>46220123 #

None of inference frameworks (vLLM/SGLang) supports the full model, let alone non-nvidia.

replies(3): >>46223310 #>>46223630 #>>46226911 #

AndreSlavescu ◴[10 Dec 25 20:50 UTC] No.46223630[source]▶

>>46222544 #

We actually deployed working speech to speech inference that builds on top of vLLM as the backbone. The main thing was to support the "Talker" module, which is currently not supported on the qwen3-omni branch for vLLM.

Check it out here: https://models.hathora.dev/model/qwen3-omni

replies(2): >>46223885 #>>46224354 #

sosodev ◴[10 Dec 25 21:49 UTC] No.46224354[source]▶

>>46223630 #

Is your work open source?

replies(1): >>46278997 #

1. AndreSlavescu ◴[15 Dec 25 19:15 UTC] No.46278997[source]▶

>>46224354 #

At the moment, no unfortunately. However, to my recent knowledge of open source alternatives, the vLLM team published a separate repository for omni models now:

https://github.com/vllm-project/vllm-omni

I have not yet tested out if this does full speech to speech, but this seems like a promising workspace for omni-modal models.

↑

Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model