←back to thread

600 points antirez | 1 comments | | HN request time: 0s | source
Show context
dakiol ◴[] No.44625484[source]
> Gemini 2.5 PRO | Claude Opus 4

Whether it's vibe coding, agentic coding, or copy pasting from the web interface to your editor, it's still sad to see the normalization of private (i.e., paid) LLM models. I like the progress that LLMs introduce and I see them as a powerful tool, but I cannot understand how programmers (whether complete nobodies or popular figures) dont mind adding a strong dependency on a third party in order to keep programming. Programming used to be (and still is, to a large extent) an activity that can be done with open and free tools. I am afraid that in a few years, that will no longer be possible (as in most programmers will be so tied to a paid LLM, that not using them would be like not using an IDE or vim nowadays), since everyone is using private LLMs. The excuse "but you earn six figures, what' $200/month to you?" doesn't really capture the issue here.

replies(46): >>44625521 #>>44625545 #>>44625564 #>>44625827 #>>44625858 #>>44625864 #>>44625902 #>>44625949 #>>44626014 #>>44626067 #>>44626198 #>>44626312 #>>44626378 #>>44626479 #>>44626511 #>>44626543 #>>44626556 #>>44626981 #>>44627197 #>>44627415 #>>44627574 #>>44627684 #>>44627879 #>>44628044 #>>44628982 #>>44629019 #>>44629132 #>>44629916 #>>44630173 #>>44630178 #>>44630270 #>>44630351 #>>44630576 #>>44630808 #>>44630939 #>>44631290 #>>44632110 #>>44632489 #>>44632790 #>>44632809 #>>44633267 #>>44633559 #>>44633756 #>>44634841 #>>44635028 #>>44636374 #
simonw ◴[] No.44626556[source]
The models I can run locally aren't as good yet, and are way more expensive to operate.

Once it becomes economical to run a Claude 4 class model locally you'll see a lot more people doing that.

The closest you can get right now might be Kimi K2 on a pair of 512GB Mac Studios, at a cost of about $20,000.

replies(12): >>44627184 #>>44627617 #>>44627695 #>>44627852 #>>44628143 #>>44631034 #>>44631098 #>>44631352 #>>44631995 #>>44632684 #>>44633226 #>>44644288 #
oblio ◴[] No.44628143[source]
The thing is, code is quite compact. Why do LLMs need to train on content bigger than the size of the textual internet to be effective?

Total newb here.

replies(1): >>44628343 #
airspresso ◴[] No.44628343[source]
Many reasons, one being that LLMs are essentially compressing the training data to unbelievably small data volumes (the weights). When doing so, they can only afford to keep the general principles and semantic meaning of the training data. Bigger models can memorize more than smaller ones of course, but are still heavily storage limited. Through this process they become really good at semantic understanding of code and language in general. It takes a certain scale of training data to achieve that.
replies(1): >>44629047 #
oblio ◴[] No.44629047[source]
Yeah, I just asked Gemini and apparently some older estimates put a relatively filtered dataset of Github source code at around 21TB in 2018, and some more recent estimates could put it in the low hundreds of TB.

Considering as you said, that LLMs are doing a form of compression, and assuming generously that you add extra compression on top, yeah, now I understand a bit more. Even if you focus on non-similar code to get the most coverage, I wouldn't be shocked if a modern, representative source code training data from Github weighed 1TB, which obviously is a lot more than consumer grade hardware can bear.

I guess we need to ramp up RAM production a bunch more :-(

Speaking of which, what's the next bottle neck except for storing the damned things? Training needs a ton of resources but that part can be pooled, even for OSS models, it "just" need to be done "once", and then the entire community can use the data set. So I guess inference is the scaling cost, what's the most used resource there? Data bandwidth for RAM?

replies(1): >>44652673 #
1. airspresso ◴[] No.44652673[source]
Yes, for inference the main bottleneck is GPU VRAM and the bandwidth between the GPU cores and VRAM. Ideally you want enough GPU VRAM to be able to load the entire model into VRAM + have room for caching the already-produced output in VRAM when you're generating output tokens. And fast enough VRAM bandwidth that you can copy the weights from VRAM to GPU compute cores as fast as possible to do the calculations for each token. This determines the tokens/sec speed you get for the output. So yes, more and faster VRAM is essential.