←back to thread

DeepSeek-v3.1

(api-docs.deepseek.com)
776 points wertyk | 1 comments | | HN request time: 0s | source
Show context
esafak ◴[] No.44977474[source]
It seems behind Qwen3 235B 2507 Reasoning (which I like) and gpt-oss-120B: https://artificialanalysis.ai/models/deepseek-v3-1-reasoning

Pricing: https://openrouter.ai/deepseek/deepseek-chat-v3.1

replies(2): >>44977550 #>>44981531 #
bigyabai ◴[] No.44977550[source]
Those Qwen3 2507 models are the local creme-de-la-creme right now. If you've got any sort of GPU and ~32gb of RAM to play with, the A3B one is great for pair-programming tasks.
replies(4): >>44977707 #>>44978006 #>>44978062 #>>44979710 #
pdimitar ◴[] No.44977707[source]
Do you happen to know if it can be run via an eGPU enclosure with f.ex. RTX 5090 inside, under Linux?

I'm considering buying a Linux workstation lately and I want it full AMD. But if I can just plug an NVIDIA card via an eGPU card for self-hosting LLMs then that would be amazing.

replies(3): >>44977887 #>>44977902 #>>44978104 #
gunalx ◴[] No.44977887[source]
You would still need drivers and all the stuff difficult with nvidia in linux with a egpu. (Its not nessecarily terrible just suboptimal) Rather just add the second GPU in the Workstation, or just run the llm in your AMD GPU.
replies(1): >>44977904 #
pdimitar ◴[] No.44977904[source]
Oh, we can run LLMs efficiently with AMD GPUs now? Pretty cool, I haven't been following, thank you.
replies(4): >>44978437 #>>44984429 #>>44984563 #>>44989107 #
1. green7ea ◴[] No.44989107[source]
llama.cpp and lmstudio have a Vulkan backend which is pretty fast. I'm using it to run models on a Strix Halo laptop and it works pretty well.