(geek.sg)

221 points whitefables | 2 comments | 16 Oct 24 05:26 UTC | HN request time: 0.003s | source

Show context

varun_ch ◴[16 Oct 24 07:27 UTC] No.41856480[source]▶

I’m curious about how good the performance with local LLMs is on ‘outdated’ hardware like the author’s 2060. I have a desktop with a 2070 super that it could be fun to turn into an “AI server” if I had the time…

replies(7): >>41856521 #>>41856558 #>>41856559 #>>41856609 #>>41856875 #>>41856894 #>>41857543 #

1. khafra ◴[16 Oct 24 07:49 UTC] No.41856609[source]▶

>>41856480 #

If you want to set up an AI server for your own use, it's exceedingly easy to install LM Studio and hit the "serve an API" button.

Testing performance this way, I got about 0.5-1.5 tokens per second with an 8GB 4bit quantized model on an old DL360 rack-mount server with 192GB RAM and 2 E5-2670 CPUs. I got about 20-50 tokens per second on my laptop with a mobile RTX 4080.

replies(1): >>41856694 #

2. taosx ◴[16 Oct 24 08:02 UTC] No.41856694[source]▶

>>41856609 (TP) #

LM studio is so nice, I'm up and running in 5 minutes. ty

↑

I Self-Hosted Llama 3.2 with Coolify on My Home Server