(api-docs.deepseek.com)

776 points wertyk | 3 comments | 21 Aug 25 19:06 UTC | HN request time: 0.421s | source

Show context

esafak ◴[21 Aug 25 20:12 UTC] No.44977474[source]▶

It seems behind Qwen3 235B 2507 Reasoning (which I like) and gpt-oss-120B: https://artificialanalysis.ai/models/deepseek-v3-1-reasoning

Pricing: https://openrouter.ai/deepseek/deepseek-chat-v3.1

replies(2): >>44977550 #>>44981531 #

bigyabai ◴[21 Aug 25 20:18 UTC] No.44977550[source]▶

>>44977474 #

Those Qwen3 2507 models are the local creme-de-la-creme right now. If you've got any sort of GPU and ~32gb of RAM to play with, the A3B one is great for pair-programming tasks.

replies(4): >>44977707 #>>44978006 #>>44978062 #>>44979710 #

pdimitar ◴[21 Aug 25 20:32 UTC] No.44977707[source]▶

>>44977550 #

Do you happen to know if it can be run via an eGPU enclosure with f.ex. RTX 5090 inside, under Linux?

I'm considering buying a Linux workstation lately and I want it full AMD. But if I can just plug an NVIDIA card via an eGPU card for self-hosting LLMs then that would be amazing.

replies(3): >>44977887 #>>44977902 #>>44978104 #

oktoberpaard ◴[21 Aug 25 21:07 UTC] No.44978104[source]▶

>>44977707 #

I’m running Ollama on 2 eGPUs over Thunderbolt. Works well for me. You’re still dealing with an NVDIA device, of course. The connection type is not going to change that hassle.

replies(1): >>44978144 #

1. pdimitar ◴[21 Aug 25 21:10 UTC] No.44978144[source]▶

>>44978104 #

Thank you for the validation. As much as I don't like NVIDIA's shenanigans on Linux, having a local LLM is very tempting and I might put my ideological problems to rest over it.

Though I have to ask: why two eGPUs? Is the LLM software smart enough to be able to use any combination of GPUs you point it at?

replies(2): >>44978798 #>>44980758 #

2. arcanemachiner ◴[21 Aug 25 22:21 UTC] No.44978798[source]▶

>>44978144 (TP) #

Yes, Ollama is very plug-and-play when it comes to multi GPU.

llama.cpp probably is too, but I haven't tried it with a bigger model yet.

3. SV_BubbleTime ◴[22 Aug 25 03:31 UTC] No.44980758[source]▶

>>44978144 (TP) #

Even today some progress was released on parallelizing WAN video generation over multiple GPUs. LLMs are way easier to split up.

↑

DeepSeek-v3.1