←back to thread

326 points threeturn | 1 comments | | HN request time: 0.207s | source

Dear Hackers, I’m interested in your real-world workflows for using open-source LLMs and open-source coding assistants on your laptop (not just cloud/enterprise SaaS). Specifically:

Which model(s) are you running (e.g., Ollama, LM Studio, or others) and which open-source coding assistant/integration (for example, a VS Code plugin) you’re using?

What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?

What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).

I'm conducting my own investigation, which I will be happy to share as well when over.

Thanks! Andrea.

1. loudmax ◴[] No.45774702[source]
I have a desktop computer with 128G of RAM and an RTX 3090 with 24G of VRAM. I use this to tinker with different models using llama.cpp and ComfyUI. I manged to get a heavily quantized instance of DeepSeek R1 running on it by following instructions from the Level1 tech forums, but it's far too slow to be useful. GPT-OSS-120b is surprisingly good, though again too quantized and too slow to be more than a toy.

For actual real work, I use Claude.

If you want to use an open weights model to get real work done, the sensible thing would be to rent a GPU in the cloud. I'd be inclined to run llama.cpp because I know it well enough, but vLLM would make more sense for models that runs entirely on the GPU.