DeepSeek-v3.1

(api-docs.deepseek.com)

776 points wertyk | 5 comments | 21 Aug 25 19:06 UTC | HN request time: 0.024s | source

Show context

esafak ◴[21 Aug 25 20:12 UTC] No.44977474[source]▶

It seems behind Qwen3 235B 2507 Reasoning (which I like) and gpt-oss-120B: https://artificialanalysis.ai/models/deepseek-v3-1-reasoning

Pricing: https://openrouter.ai/deepseek/deepseek-chat-v3.1

replies(2): >>44977550 #>>44981531 #

bigyabai ◴[21 Aug 25 20:18 UTC] No.44977550[source]▶

>>44977474 #

Those Qwen3 2507 models are the local creme-de-la-creme right now. If you've got any sort of GPU and ~32gb of RAM to play with, the A3B one is great for pair-programming tasks.

replies(4): >>44977707 #>>44978006 #>>44978062 #>>44979710 #

pdimitar ◴[21 Aug 25 20:32 UTC] No.44977707[source]▶

>>44977550 #

Do you happen to know if it can be run via an eGPU enclosure with f.ex. RTX 5090 inside, under Linux?

I'm considering buying a Linux workstation lately and I want it full AMD. But if I can just plug an NVIDIA card via an eGPU card for self-hosting LLMs then that would be amazing.

replies(3): >>44977887 #>>44977902 #>>44978104 #

gunalx ◴[21 Aug 25 20:48 UTC] No.44977887[source]▶

>>44977707 #

You would still need drivers and all the stuff difficult with nvidia in linux with a egpu. (Its not nessecarily terrible just suboptimal) Rather just add the second GPU in the Workstation, or just run the llm in your AMD GPU.

replies(1): >>44977904 #

1. pdimitar ◴[21 Aug 25 20:49 UTC] No.44977904[source]▶

>>44977887 #

Oh, we can run LLMs efficiently with AMD GPUs now? Pretty cool, I haven't been following, thank you.

replies(4): >>44978437 #>>44984429 #>>44984563 #>>44989107 #

2. DarkFuture ◴[21 Aug 25 21:42 UTC] No.44978437[source]▶

>>44977904 (TP) #

I've been running LLM models on my Radeon 7600 XT 16GB for past 2-3 months without issues (Windows 11). I've been using llama.cpp only. The only thing from AMD I installed (apart from latest Radeon drivers) is the "AMD HIP SDK" (very straight forward installer). After unzipping (the zip from GitHub releases page must contain hip-radeon in the name) all I do is this:

llama-server.exe -ngl 99 -m Qwen3-14B-Q6_K.gguf

And then connect to llamacpp via browser to localhost:8080 for the WebUI (its basic but does the job, screenshots can be found on Google). You can connect more advanced interfaces to it because llama.cpp actually has OpenAI-compatible API.

3. bavell ◴[22 Aug 25 13:27 UTC] No.44984429[source]▶

>>44977904 (TP) #

IDK about "efficiently" but we've been able to run llms locally with AMD for 1.5-2 years now

4. Plasmoid2000ad ◴[22 Aug 25 13:39 UTC] No.44984563[source]▶

>>44977904 (TP) #

Yes - I'm running a LM Studio on windows on a 6800xt, and everything works more-or-less out of the box using always using Vulkan llama.cpp on the gpu I believe.

There's also ROCm. That's not working for me in LM Studio at the moment. I used that early last year to get some LLMs and stable diffusion running. As far as I can tell, it was faster before, but Vulkan implementations have caught up or something - so much the mucking about isn't often worth it. I believe ROCm is hit or miss for a lot of people, especially on windows.

5. green7ea ◴[22 Aug 25 19:58 UTC] No.44989107[source]▶

>>44977904 (TP) #

llama.cpp and lmstudio have a Vulkan backend which is pretty fast. I'm using it to run models on a Strix Halo laptop and it works pretty well.

↑