I’m curious about how good the performance with local LLMs is on ‘outdated’ hardware like the author’s 2060. I have a desktop with a 2070 super that it could be fun to turn into an “AI server” if I had the time…
I am using an old laptop with a GTX 1060 6 GB VRAM to run a home server with Ubuntu and Ollama. Because of quantization Ollama can run 7B/8B models on an 8 year old laptop GPU with 6 GB VRAM.