←back to thread

221 points whitefables | 1 comments | | HN request time: 0.233s | source
Show context
varun_ch ◴[] No.41856480[source]
I’m curious about how good the performance with local LLMs is on ‘outdated’ hardware like the author’s 2060. I have a desktop with a 2070 super that it could be fun to turn into an “AI server” if I had the time…
replies(7): >>41856521 #>>41856558 #>>41856559 #>>41856609 #>>41856875 #>>41856894 #>>41857543 #
khafra ◴[] No.41856609[source]
If you want to set up an AI server for your own use, it's exceedingly easy to install LM Studio and hit the "serve an API" button.

Testing performance this way, I got about 0.5-1.5 tokens per second with an 8GB 4bit quantized model on an old DL360 rack-mount server with 192GB RAM and 2 E5-2670 CPUs. I got about 20-50 tokens per second on my laptop with a mobile RTX 4080.

replies(1): >>41856694 #
1. taosx ◴[] No.41856694[source]
LM studio is so nice, I'm up and running in 5 minutes. ty