sure would be neat if these companies would release models that could run on consumer hardware
replies(2):
https://huggingface.co/mlx-community/Qwen3-Next-80B-A3B-Inst...
I usually use GPT-oss-120B with CPU MoE offloading. It writes at about 10tps, which is useful enough for the limited things I use it for. But I’m curious how Q3 Next will work (or whether I’ll be able to offload and run it with GPU acceleration at all.)
(4090)