←back to thread

577 points simonw | 1 comments | | HN request time: 0s | source
Show context
joelthelion ◴[] No.44724227[source]
Apart from using a Mac, what can you use for inference with reasonable performance? Is a Mac the only realistic option at the moment?
replies(6): >>44724398 #>>44724419 #>>44724553 #>>44724563 #>>44724959 #>>44727049 #
1. badsectoracula ◴[] No.44727049[source]
An Nvidia GPU is the most common answer, but personally i've done all my LLM use locally using mainly Mistral Small 3.1/3.2-based models and llama.cpp with an AMD RX 7900 XTX GPU. It only gives you ~4.71 tokens per second, but that is fast enough for a lot of uses. For example last month or so i wrote a raytracer[0][1] in C with Devstral Small 1.0 (based on Mistral Small 3.1). It wasn't "vibe coding" as much as a "co-op" where i'd go back and forth a chat interface (koboldcpp) and i'd, e.g. ask the LLM to implement some feature, then i'd switch to the editor and start writing code using that feature while the LLM was generating it in the background. Or, more often, i'd fix bugs in the LLM's code :-P.

FWIW GPU aside, my PC isn't particularly new - it is a 5-6 year old PC that was the cheapest money could buy originally and became "decent" at the time i upgraded it ~5 years ago and i only added the GPU around Christmas as prices were dropping since AMD was about to release the new GPUs.

[0] https://i.imgur.com/FevOm0o.png

[1] https://app.filen.io/#/d/e05ae468-6741-453c-a18d-e83dcc3de92...