(www.youtube.com)

1480 points sandslash | 1 comments | 19 Jun 25 00:33 UTC | HN request time: 0.236s | source

Show context

khalic ◴[19 Jun 25 10:28 UTC] No.44317209[source]▶

His dismissal of smaller and local models suggests he underestimates their improvement potential. Give phi4 a run and see what I mean.

replies(5): >>44317248 #>>44317295 #>>44317350 #>>44317621 #>>44317716 #

dist-epoch ◴[19 Jun 25 11:44 UTC] No.44317716[source]▶

>>44317209 #

I tried the local small models. They are slow, much less capable, and ironically much more expensive to run than the frontier cloud models.

replies(1): >>44317776 #

khalic ◴[19 Jun 25 11:52 UTC] No.44317776[source]▶

>>44317716 #

Phi4-mini runs on a basic laptop CPU at 20T/s… how is that slow? Without optimization…

replies(1): >>44317806 #

dist-epoch ◴[19 Jun 25 11:57 UTC] No.44317806[source]▶

>>44317776 #

I was running Qwen3-32B locally even faster, 70T/s, still way too slow for me. I'm generating thousands of tokens of output per request (not coding), running locally I could get 6 mil tokens per day and pay electricity, or I can get more tokens per day from Google Gemini 2.5 Flash for free.

Running models locally is a privilege for the rich and those with too much disposable time.

replies(1): >>44423931 #

1. yencabulator ◴[30 Jun 25 14:39 UTC] No.44423931[source]▶

>>44317806 #

Try Qwen3-30B-A3B. It's MoE to an extent where its use of memory bandwidth looks more like a 3B model, and thus it typically goes faster.

↑

Andrej Karpathy: Software in the era of AI [video]