Most active commenters

exe34(4)
Wowfunhappy(3)

How large are large language models?

(gist.github.com)

Show context

ljoshua ◴[02 Jul 25 13:00 UTC] No.44443222[source]▶

>>44442072 (OP) #

Less a technical comment and more just a mind-blown comment, but I still can’t get over just how much data is compressed into and available in these downloadable models. Yesterday I was on a plane with no WiFi, but had gemma3:12b downloaded through Ollama. Was playing around with it and showing my kids, and we fired history questions at it, questions about recent video games, and some animal fact questions. It wasn’t perfect, but holy cow the breadth of information that is embedded in an 8.1 GB file is incredible! Lossy, sure, but a pretty amazing way of compressing all of human knowledge into something incredibly contained.

replies(22): >>44443263 #>>44443274 #>>44443296 #>>44443751 #>>44443781 #>>44443840 #>>44443976 #>>44444227 #>>44444418 #>>44444471 #>>44445299 #>>44445966 #>>44446013 #>>44446775 #>>44447373 #>>44448218 #>>44448315 #>>44448452 #>>44448810 #>>44449169 #>>44449182 #>>44449585 #

exe34 ◴[02 Jul 25 13:09 UTC] No.44443296[source]▶

>>44443222 #

Wikipedia is about 24GB, so if you're allowed to drop 1/3 of the details and make up the missing parts by splicing in random text, 8GB doesn't sound too bad.

To me the amazing thing is that you can tell the model to do something, even follow simple instructions in plain English, like make a list or write some python code to do $x, that's the really amazing part.

replies(2): >>44443455 #>>44444576 #

Nevermark ◴[02 Jul 25 14:59 UTC] No.44444576[source]▶

>>44443296 #

It blows my mind that I can ask for 50 synonyms, instantly get a great list with great meaning summaries.

Then ask for the same list sorted and get that nearly instantly,

These models have a short time context for now, but they already have a huge “working memory” relative to us.

It is very cool. And indicative that vastly smarter models are going to be achieved fairly easily, with new insight.

Our biology has had to ruthlessly work within our biological/ecosystem energy envelope, and with the limited value/effort returned by a pre-internet pre-vast economy.

So biology has never been able to scale. Just get marginally more efficient and effective within tight limits.

Suddenly, (in historical, biological terms), energy availability limits have been removed, and limits on the value of work have compounded and continue to do so. Unsurprising that those changes suddenly unlock easily achieved vast untapped room for cognitive upscaling.

replies(1): >>44444874 #

1. Wowfunhappy ◴[02 Jul 25 15:23 UTC] No.44444874{3}[source]▶

>>44444576 #

> These models [...] have a huge “working memory” relative to us. [This is] indicative that vastly smarter models are going to be achieved fairly easily, with new insight.

I don't think your second sentence logically follows from the first.

Relative to us, these models:

- Have a much larger working memory.

- Have much more limited logical reasoning skills.

To some extent, these models are able to use their superior working memories to compensate for their limited reasoning abilities. This can make them very useful tools! But there may well be a ceiling to how far that can go.

When you ask a model to "think about the problem step by step" to improve its reasoning, you are basically just giving it more opportunities to draw on its huge memory bank and try to put things together. But humans are able to reason with orders of magnitude less training data. And by the way, we are out of new training data to give the models.

replies(4): >>44445369 #>>44446428 #>>44448182 #>>44449367 #

2. antonvs ◴[02 Jul 25 16:02 UTC] No.44445369[source]▶

>>44444874 (TP) #

> Have much more limited logical reasoning skills.

Relative to the best humans, perhaps, but I seriously doubt this is true in general. Most people I work with couldn’t reason nearly as well through the questions I use LLMs to answer.

It’s also worth keeping in mind that having a different approach to reasoning is not necessarily equivalent to a worse approach. Watch out for cherry-picking the cons of its approach and ignoring the pros.

replies(1): >>44446443 #

3. exe34 ◴[02 Jul 25 17:27 UTC] No.44446428[source]▶

>>44444874 (TP) #

> But humans are able to reason with orders of magnitude less training data.

Common belief, but false. You start learning from inside the womb. The data flow increases exponentially when you open your eyes and then again when you start manipulating things with your hands and mouth.

> When you ask a model to "think about the problem step by step" to improve its reasoning, you are basically just giving it more opportunities to draw on its huge memory bank and try to put things together.

We do the same with children. At least I did it to my classmates when they asked me for help. I'd give them a hint, and ask them to work it out step by step from there. It helped.

replies(2): >>44447596 #>>44449413 #

4. exe34 ◴[02 Jul 25 17:27 UTC] No.44446443[source]▶

>>44445369 #

> Relative to the best humans,

For some reason, the bar for AI is always against the best possible human, right now.

replies(1): >>44456497 #

5. Wowfunhappy ◴[02 Jul 25 19:06 UTC] No.44447596[source]▶

>>44446428 #

> Common belief, but false. You start learning from inside the womb. The data flow increases exponentially when you open your eyes and then again when you start manipulating things with your hands and mouth.

But you don't get data equal to the entire internet as a child!

> We do the same with children. At least I did it to my classmates when they asked me for help. I'd give them a hint, and ask them to work it out step by step from there. It helped.

And I do it with my students. I still think there's a difference in kind between when I listen to my students (or other adults) reason through a problem, and when I look at the output of an AI's reasoning, but I admittedly couldn't tell you what that is, so point taken. I still think the AI is relying far more heavily on its knowledge base.

replies(2): >>44448125 #>>44449426 #

6. jacobr1 ◴[02 Jul 25 19:57 UTC] No.44448125{3}[source]▶

>>44447596 #

There seems to be lots of mixed data points on this, but to some extent there is knowledge encoded into the evolutionary base state of the new human brain. Probably not directly as knowledge, but "primed" to quickly to establish relevant world models and certain types of reasoning with a small number of examples.

7. jacobr1 ◴[02 Jul 25 20:03 UTC] No.44448182[source]▶

>>44444874 (TP) #

> And by the way, we are out of new training data to give the models.

Only easily accessible text data. We haven't really started using video at scale yet for example. It looks like data for specific tasks goes really far too ... for example agentic coding interactions aren't something that has generally been captured on the internet. But capturing interactions with coding agents, in combination with the base-training of existing programming knowledge already captured is resulting in significant performance increases. The amount of specicialed data we might need to gather or synthetically generate is perhaps orders of magnitude less that presumed with pure supervised learning systems. And for other applications like industrial automation or robotics we've barely started capturing all the sensor data that lives in those systems.

replies(1): >>44450166 #

8. Nevermark ◴[02 Jul 25 22:08 UTC] No.44449367[source]▶

>>44444874 (TP) #

My response completely acknowledged their current reasoning limits.

But in evolutionary time frames, clearly those limits are lifting extraordinarily quickly. By many orders of magnitude.

And the point I made, that our limits were imposed by harsh biological energy and reward limits, vs. todays models (and their successors) which have access to relatively unlimited energy, and via sharing value with unlimited customers, unlimited rewards, stands.

It is a much simpler problem to improve digital cognition in a global ecosystem of energy production, instant communication and global application, than it was for evolution to improve an individual animals cognition in the limited resources of local habitats and their inefficient communication of advances.

9. ◴[02 Jul 25 22:14 UTC] No.44449413[source]▶

>>44446428 #

10. oceanplexian ◴[02 Jul 25 22:15 UTC] No.44449426{3}[source]▶

>>44447596 #

Your field of vision is equivalent to something like 500 Megapixels. And assume it’s uncompressed because it’s not like your eyeballs are doing H.264.

Given vision and the other senses, I’d argue that your average toddler has probably trained on more sensory information than the largest LLMs ever built long before they learn to talk.

replies(1): >>44449781 #

11. all2 ◴[02 Jul 25 23:09 UTC] No.44449781{4}[source]▶

>>44449426 #

There's an adaptation in there somewhere, though. Humans have a 'field of view' that constrains input data, and on the data processing side we have a 'center of focus' that generally rests wherever the eye rests (there's an additional layer where people learn to 'search' their vision by moving their mental center of focus without moving the physical focus point of the eye.

Then there's the whole slew of processes that pick up two or three key points of data and then fill in the rest (EX the moonwalking bear experiment [0]).

I guess all I'm saying is that raw input isn't the only piece of the puzzle. Maybe it is at the start before a kiddo _knows_ how to focus and filter info?

[0] https://www.youtube.com/watch?v=xNSgmm9FX2s

12. ◴[03 Jul 25 00:04 UTC] No.44450166[source]▶

>>44448182 #

13. antonvs ◴[03 Jul 25 16:08 UTC] No.44456497{3}[source]▶

>>44446443 #

It seems that 90% of discussion about AI boils down to people who feel threatened by it in some way, and are lashing out in irrational ways as a result. (Source for 90% figure: Sturgeon's Law.)

replies(2): >>44457610 #>>44459331 #

14. exe34 ◴[03 Jul 25 17:59 UTC] No.44457610{4}[source]▶

>>44456497 #

1. X could happen.

2. I would hate if X happened.

3. Therefore X is not possible.

15. Wowfunhappy ◴[03 Jul 25 21:25 UTC] No.44459331{4}[source]▶

>>44456497 #

But doesn't this also apply to the other side of the argument? People are invested in AI either professionally or financially just emotionally because they want it to make their lives better, and so they loose sight of AI's flaws.

I don't know who is right—which IMHO what makes this topic interesting.

↑