←back to thread

314 points pretext | 1 comments | | HN request time: 0.194s | source
Show context
gardnr ◴[] No.46220742[source]
This is a 30B parameter MoE with 3B active parameters and is the successor to their previous 7B omni model. [1]

You can expect this model to have similar performance to the non-omni version. [2]

There aren't many open-weights omni models so I consider this a big deal. I would use this model to replace the keyboard and monitor in an application while doing the heavy lifting with other tech behind the scenes. There is also a reasoning version, which might be a bit amusing in an interactive voice chat if it pronounces the thinking tokens while working through to a final answer.

1. https://huggingface.co/Qwen/Qwen2.5-Omni-7B

2. https://artificialanalysis.ai/models/qwen3-30b-a3b-instruct

replies(7): >>46221003 #>>46221130 #>>46221363 #>>46221587 #>>46222499 #>>46222576 #>>46229484 #
1. red2awn ◴[] No.46222499[source]
This is a stack of models:

- 650M Audio Encoder

- 540M Vision Encoder

- 30B-A3B LLM

- 3B-A0.3B Audio LLM

- 80M Transformer/200M ConvNet audio token to waveform

This is a closed source weight update to their Qwen3-Omni model. They had a previous open weight release Qwen/Qwen3-Omni-30B-A3B-Instruct and a closed version Qwen3-Omni-Flash.

You basically can't use this model right now since none of the open source inference framework have the model fully implemented. It works on transformers but it's extremely slow.