Most active commenters

agumonkey(3)

How large are large language models?

(gist.github.com)

Show context

ljoshua ◴[02 Jul 25 13:00 UTC] No.44443222[source]▶

>>44442072 (OP) #

Less a technical comment and more just a mind-blown comment, but I still can’t get over just how much data is compressed into and available in these downloadable models. Yesterday I was on a plane with no WiFi, but had gemma3:12b downloaded through Ollama. Was playing around with it and showing my kids, and we fired history questions at it, questions about recent video games, and some animal fact questions. It wasn’t perfect, but holy cow the breadth of information that is embedded in an 8.1 GB file is incredible! Lossy, sure, but a pretty amazing way of compressing all of human knowledge into something incredibly contained.

replies(22): >>44443263 #>>44443274 #>>44443296 #>>44443751 #>>44443781 #>>44443840 #>>44443976 #>>44444227 #>>44444418 #>>44444471 #>>44445299 #>>44445966 #>>44446013 #>>44446775 #>>44447373 #>>44448218 #>>44448315 #>>44448452 #>>44448810 #>>44449169 #>>44449182 #>>44449585 #

1. agumonkey ◴[02 Jul 25 14:00 UTC] No.44443840[source]▶

>>44443222 #

Intelligence is compression some say

replies(5): >>44444701 #>>44445011 #>>44445637 #>>44446842 #>>44449234 #

2. Nevermark ◴[02 Jul 25 15:09 UTC] No.44444701[source]▶

>>44443840 (TP) #

Very much so!

The more and faster a “mind” can infer, the less it needs to store.

Think how much fewer facts a symbolic system that can perform calculus needs to store, vs. an algebraic, or just arithmetic system, to cover the same numerical problem solving space. Many orders of magnitude less.

The same goes for higher orders of reasoning. General or specific subject related.

And higher order reasoning vastly increases capabilities extending into new novel problem spaces.

I think model sizes may temporarily drop significantly, after every major architecture or training advance.

In the long run, “A circa 2025 maxed M3 Ultra Mac Studio is all you need!” (/h? /s? Time will tell.)

replies(1): >>44446063 #

3. goatlover ◴[02 Jul 25 15:35 UTC] No.44445011[source]▶

>>44443840 (TP) #

How well does that apply to robotics or animal intelligence? Manipulating the real world is more fundamental to human intelligence than compressing text.

replies(1): >>44445464 #

4. ToValueFunfetti ◴[02 Jul 25 16:11 UTC] No.44445464[source]▶

>>44445011 #

Under the predictive coding model (and I'm sure some others), animal intelligence is also compression. The idea is that the early layers of the brain minimize how surprising incoming sensory signals are, so the later layers only have to work with truly entropic signal. But it has non-compression-based intelligence within those more abstract layers.

replies(1): >>44448055 #

5. penguin_booze ◴[02 Jul 25 16:25 UTC] No.44445637[source]▶

>>44443840 (TP) #

I don't know why, but I was reminded of Douglas Hofstadter's talk: Analogy is cognition: https://www.youtube.com/watch?v=n8m7lFQ3njk&t=964s.

6. agumonkey ◴[02 Jul 25 16:57 UTC] No.44446063[source]▶

>>44444701 #

I don't know who else took notes by diffing their own assumptions with lectures / talks. There was a notion of what's really new compared to previous conceptual state, what adds new information.

7. tshaddox ◴[02 Jul 25 17:59 UTC] No.44446842[source]▶

>>44443840 (TP) #

Some say that. But what I value even more than compression is the ability to create new ideas which do not in any way exist in the set of all previously-conceived ideas.

replies(1): >>44449294 #

8. goatlover ◴[02 Jul 25 19:51 UTC] No.44448055{3}[source]▶

>>44445464 #

I just wonder if neuroscientists use that kind of model.

replies(1): >>44449784 #

9. hamilyon2 ◴[02 Jul 25 21:52 UTC] No.44449234[source]▶

>>44443840 (TP) #

Crystallized intelligence is. I am not sure about fluid intelligence.

replies(1): >>44449396 #

10. benreesman ◴[02 Jul 25 21:59 UTC] No.44449294[source]▶

>>44446842 #

I'm toying with the phrase "precedented originality" as a way to describe the optimal division of labor when I work with Opus 4 running hot (which is the first one where I consistently come out ahead by using it). That model at full flog seems to be very close to the asymptote for the LLM paradigm on coding: they've really pulled out all the stops (the temperature is so high it makes trivial typographical errors, it will discuss just about anything, it will churn for 10, 20, 30 seconds to first token via API).

Its good enough that it has changed my mind about the fundamental utility of LLMs for coding in non-Javascript complexity regimes.

But its still not an expert programmer, not by a million miles, there is no way I could delegate my job to it (and keep my job). So there's some interesting boundary that's different than I used to think.

I think its in the vicinity of "how much precedent exists for this thought or idea or approach". The things I bring to the table in that setting have precedent too, but much more tenuously connected to like one clear precedent on e.g. GitHub, because if the thing I need was on GitHub I would download it.

11. antisthenes ◴[02 Jul 25 22:11 UTC] No.44449396[source]▶

>>44449234 #

Fluid intelligence is just how quickly you acquire crystallized intelligence.

It's the first derivative.

replies(1): >>44449628 #

12. agumonkey ◴[02 Jul 25 22:45 UTC] No.44449628{3}[source]▶

>>44449396 #

Talking about that, people designed a memory game, dual n back, which allegedly improve fluid intelligence.

13. ToValueFunfetti ◴[02 Jul 25 23:09 UTC] No.44449784{4}[source]▶

>>44448055 #

I doubt there's any consensus on one model, but it's certainly true that many neuroscientists are using the predictive coding at least some of the time

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C36&q=pre...

↑