It seems behind Qwen3 235B 2507 Reasoning (which I like) and gpt-oss-120B:
https://artificialanalysis.ai/models/deepseek-v3-1-reasoning
replies(2):
While the memory bandwidth is decent, you do actually need to do matmuls and other compute operations for LLMs, which again its pretty slow at
[0]: https://old.reddit.com/r/LocalLLaMA/comments/1dcdit2/p40_ben... [1]: https://developer.nvidia.com/cuda-gpus