Most active commenters
  • khalic(9)
  • diggan(3)

←back to thread

1480 points sandslash | 19 comments | | HN request time: 1.386s | source | bottom
1. khalic ◴[] No.44317209[source]
His dismissal of smaller and local models suggests he underestimates their improvement potential. Give phi4 a run and see what I mean.
replies(5): >>44317248 #>>44317295 #>>44317350 #>>44317621 #>>44317716 #
2. TeMPOraL ◴[] No.44317248[source]
He ain't dismissing them. Comparing local/"open" model to Linux (and closed services to Windows and MacOS) is high praise. It's also accurate.
replies(1): >>44317671 #
3. sriram_malhar ◴[] No.44317295[source]
Of all the things you could suggest, a lack of understanding is not one that can be pinned on Karpathy. He does know his technical stuff.
replies(1): >>44317657 #
4. diggan ◴[] No.44317350[source]
> suggests a lack of understanding of these smaller models capabilities

If anything, you're showing a lack of understanding of what he was talking about. The context is this specific time, where we're early in a ecosystem and things are expensive and likely centralized (ala mainframes) but if his analogy/prediction is correct, we'll have a "Linux" moment in the future where that equation changes (again) and local models are competitive.

And while I'm a huge fan of local models run them for maybe 60-70% of what I do with LLMs, they're nowhere near proprietary ones today, sadly. I want them to, really badly, but it's important to be realistic here and realize the differences of what a normal consumer can run, and what the current mainframes can run.

replies(2): >>44317720 #>>44317744 #
5. mprovost ◴[] No.44317621[source]
You can disagree with his conclusions but I don't think his understanding of small models is up for debate. This is the person who created micrograd/makemore/nanoGPT and who has produced a ton of educational materials showing how to build small and local models.
replies(1): >>44317729 #
6. khalic ◴[] No.44317657[source]
We all have blind spots
replies(1): >>44317692 #
7. khalic ◴[] No.44317671[source]
This is a bad comparison
8. diggan ◴[] No.44317692{3}[source]
Sure, but maybe suggesting that the person who literally spent countless hours educating others on how to build small models locally from scratch, is lacking knowledge about local small models is going a bit beyond "people have blind spots".
replies(1): >>44317749 #
9. dist-epoch ◴[] No.44317716[source]
I tried the local small models. They are slow, much less capable, and ironically much more expensive to run than the frontier cloud models.
replies(1): >>44317776 #
10. khalic ◴[] No.44317720[source]
He understands the technical part, of course, I was referring to his prediction that large models will be always be necessary.

There is a point where an LLM is good enough for most tasks, I don’t need a megamind AI in order to greet clients, and both large and small/medium model size are getting there, with the large models hitting a computing/energy demand barrier. The small models won’t hit that barrier anytime soon.

replies(1): >>44317799 #
11. khalic ◴[] No.44317729[source]
I’m going to edit, it was badly formulated, he underestimates their potential for growth is what I meant by that
replies(1): >>44317808 #
12. khalic ◴[] No.44317744[source]
I edited to make it clearer
13. khalic ◴[] No.44317749{4}[source]
Their potential, not how they work, it was very badly formulated, just corrected it
14. khalic ◴[] No.44317776[source]
Phi4-mini runs on a basic laptop CPU at 20T/s… how is that slow? Without optimization…
replies(1): >>44317806 #
15. vikramkr ◴[] No.44317799{3}[source]
Did he predict they'd always be necessary? He mostly seemed to predict the opposite, that we're at the early stage of a trajectory that has yet to have it's Linux moment
replies(1): >>44322878 #
16. dist-epoch ◴[] No.44317806{3}[source]
I was running Qwen3-32B locally even faster, 70T/s, still way too slow for me. I'm generating thousands of tokens of output per request (not coding), running locally I could get 6 mil tokens per day and pay electricity, or I can get more tokens per day from Google Gemini 2.5 Flash for free.

Running models locally is a privilege for the rich and those with too much disposable time.

replies(1): >>44423931 #
17. diggan ◴[] No.44317808{3}[source]
> underestimates their potential for growth

As far as I understood the talk and the analogies, he's saying that local models will eventually replace the current popular "mainframe" architecture. How is that underestimating them?

18. khalic ◴[] No.44322878{4}[source]
I understand, thanks for pointing that out
19. yencabulator ◴[] No.44423931{4}[source]
Try Qwen3-30B-A3B. It's MoE to an extent where its use of memory bandwidth looks more like a 3B model, and thus it typically goes faster.