(qwen.ai)

314 points pretext | 4 comments | 10 Dec 25 16:13 UTC | HN request time: 0.586s | source

1. terhechte ◴[10 Dec 25 17:54 UTC] No.46220981[source]▶

Is there a way to run these Omni models on a Macbook quantized via GGUF or MLX? I know I can run it in LMStudio or Llama.cpp but they don't have streaming microphone support or streaming webcam support.

Qwen usually provides example code in Python that requires Cuda and a non-quantized model. I wonder if there is by now a good open source project to support this use case?

replies(2): >>46222558 #>>46222569 #

2. mobilio ◴[10 Dec 25 19:39 UTC] No.46222558[source]▶

>>46220981 (TP) #

Yes - there is a way: https://github.com/ggml-org/whisper.cpp

replies(1): >>46223198 #

3. tgtweak ◴[10 Dec 25 19:39 UTC] No.46222569[source]▶

>>46220981 (TP) #

You can probably follow the vLLM instructions for omni here, then use the included voice demo html to interface with it:

https://github.com/QwenLM/Qwen3-Omni#vllm-usage

https://github.com/QwenLM/Qwen3-Omni?tab=readme-ov-file#laun...

4. novaray ◴[10 Dec 25 20:21 UTC] No.46223198[source]▶

>>46222558 #

Whisper and Qwen Omni models have completely different architectures as far as I know

↑

Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model