It seems behind Qwen3 235B 2507 Reasoning (which I like) and gpt-oss-120B:
https://artificialanalysis.ai/models/deepseek-v3-1-reasoning
replies(2):
I'm considering buying a Linux workstation lately and I want it full AMD. But if I can just plug an NVIDIA card via an eGPU card for self-hosting LLMs then that would be amazing.
llama-server.exe -ngl 99 -m Qwen3-14B-Q6_K.gguf
And then connect to llamacpp via browser to localhost:8080 for the WebUI (its basic but does the job, screenshots can be found on Google). You can connect more advanced interfaces to it because llama.cpp actually has OpenAI-compatible API.