←back to thread

1480 points sandslash | 1 comments | | HN request time: 0.236s | source
Show context
khalic ◴[] No.44317209[source]
His dismissal of smaller and local models suggests he underestimates their improvement potential. Give phi4 a run and see what I mean.
replies(5): >>44317248 #>>44317295 #>>44317350 #>>44317621 #>>44317716 #
dist-epoch ◴[] No.44317716[source]
I tried the local small models. They are slow, much less capable, and ironically much more expensive to run than the frontier cloud models.
replies(1): >>44317776 #
khalic ◴[] No.44317776[source]
Phi4-mini runs on a basic laptop CPU at 20T/s… how is that slow? Without optimization…
replies(1): >>44317806 #
dist-epoch ◴[] No.44317806[source]
I was running Qwen3-32B locally even faster, 70T/s, still way too slow for me. I'm generating thousands of tokens of output per request (not coding), running locally I could get 6 mil tokens per day and pay electricity, or I can get more tokens per day from Google Gemini 2.5 Flash for free.

Running models locally is a privilege for the rich and those with too much disposable time.

replies(1): >>44423931 #
1. yencabulator ◴[] No.44423931[source]
Try Qwen3-30B-A3B. It's MoE to an extent where its use of memory bandwidth looks more like a 3B model, and thus it typically goes faster.