I’m curious about how good the performance with local LLMs is on ‘outdated’ hardware like the author’s 2060. I have a desktop with a 2070 super that it could be fun to turn into an “AI server” if I had the time…
replies(7):
Edit: I've loaded llama 3.1 8b instruct GGUF and I got 12.61 tok/sec and 80tok/sec for 3.2 3b.