DeepSeek-v3.1

(api-docs.deepseek.com)

776 points wertyk | 2 comments | 21 Aug 25 19:06 UTC | HN request time: 0.001s | source

Show context

esafak ◴[21 Aug 25 20:12 UTC] No.44977474[source]▶

It seems behind Qwen3 235B 2507 Reasoning (which I like) and gpt-oss-120B: https://artificialanalysis.ai/models/deepseek-v3-1-reasoning

Pricing: https://openrouter.ai/deepseek/deepseek-chat-v3.1

replies(2): >>44977550 #>>44981531 #

bigyabai ◴[21 Aug 25 20:18 UTC] No.44977550[source]▶

>>44977474 #

Those Qwen3 2507 models are the local creme-de-la-creme right now. If you've got any sort of GPU and ~32gb of RAM to play with, the A3B one is great for pair-programming tasks.

replies(4): >>44977707 #>>44978006 #>>44978062 #>>44979710 #

decide1000 ◴[21 Aug 25 21:03 UTC] No.44978062[source]▶

>>44977550 #

I use it on a 24gb gpu Tesla P40. Very happy with the result.

replies(1): >>44978305 #

hkt ◴[21 Aug 25 21:28 UTC] No.44978305[source]▶

>>44978062 #

Out of interest, roughly how many tokens per second do you get on that?

replies(1): >>44978371 #

edude03 ◴[21 Aug 25 21:35 UTC] No.44978371{3}[source]▶

>>44978305 #

Like 4. Definitely single digit. The P40s are slow af

replies(1): >>44979987 #

1. coolspot ◴[22 Aug 25 00:56 UTC] No.44979987{4}[source]▶

>>44978371 #

P40 has memory bandwidth of 346GB/s which means it should be able to do around 14+ t/s running a 24 GB model+context.

replies(1): >>45043333 #

2. edude03 ◴[27 Aug 25 18:38 UTC] No.45043333[source]▶

>>44979987 (TP) #

Not sure why I got downvoted - literally the first result (for me) says the best result[0] is 11t/s at Q3. Everything else is single digits, like 2-8t/s. Also considering that its not supported anymore[1] (It's Compute Capability is 6.1, not supported by cuda anymore) and it's power draw, I'd highly recommend anyone interested in ML stay far away from it - even if its all you can afford.

While the memory bandwidth is decent, you do actually need to do matmuls and other compute operations for LLMs, which again its pretty slow at

[0]: https://old.reddit.com/r/LocalLLaMA/comments/1dcdit2/p40_ben... [1]: https://developer.nvidia.com/cuda-gpus

↑