S1: A $6 R1 competitor?

(timkellogg.me)

Show context

mtrovo ◴[05 Feb 25 16:48 UTC] No.42951263[source]▶

I found the discussion around inference scaling with the 'Wait' hack so surreal. The fact such an ingeniously simple method can impact performance makes me wonder how many low-hanging fruit we're still missing. So weird to think that improvements on a branch of computer science is boiling down to conjuring the right incantation words, how you even change your mindset to start thinking this way?

replies(16): >>42951704 #>>42951764 #>>42951829 #>>42953577 #>>42954518 #>>42956436 #>>42956535 #>>42956674 #>>42957820 #>>42957909 #>>42958693 #>>42960400 #>>42960464 #>>42961717 #>>42964057 #>>43000399 #

xg15 ◴[05 Feb 25 19:13 UTC] No.42953577[source]▶

>>42951263 #

I think the fact alone that distillation and quantization are techniques that can produce substantial improvements is a strong sign that we still have no real comprehensive understanding how the models work.

If we had, there would be no reason to train a model with more parameters than are strictly necessary to represent the space's semantic structure. But then it should be impossible for distilled models with less parameters to come close to the performance of the original model.

Yet this is what happens - the distilled or quantized models often come very close to the original model.

So I think there are still many low-hanging fruits to pick.

replies(5): >>42955228 #>>42956999 #>>42957002 #>>42959159 #>>42966394 #

teruakohatu ◴[05 Feb 25 21:13 UTC] No.42955228[source]▶

>>42953577 #

> still have no real comprehensive understanding how the models work.

We do understand how they work, we just have not optimised their usage.

For example someone who has a good general understanding of how an ICE or EV car works. Even if the user interface is very unfamiliar, they can figure out how to drive any car within a couple of minutes.

But that does not mean they can race a car, drift a car or drive a car on challenging terrain even if the car is physically capable of all these things.

replies(3): >>42955842 #>>42955941 #>>42962716 #

gessha ◴[05 Feb 25 22:04 UTC] No.42955941[source]▶

>>42955228 #

Your example is somewhat inadequate. We _fundamentally_ don’t understand how deep learning systems works in the sense that they are more or less black boxes that we train and evaluate. Innovations in ML are a whole bunch of wizards with big stacks of money changing “Hmm” to “Wait” and seeing what happens.

Would a different sampler help you? I dunno, try it. Would a smaller dataset help? I dunno, try it. Would training the model for 5000 days help? I dunno, try it.

Car technology is the opposite of that - it’s a white box. It’s composed of very well defined elements whose interactions are defined and explained by laws of thermodynamics and whatnot.

replies(2): >>42959322 #>>42960342 #

1. raducu ◴[06 Feb 25 08:36 UTC] No.42960342[source]▶

>>42955941 #

> _fundamentally_ don’t understand how deep learning systems works.

It's like saying we don't understand how quantum chromodynamics works. Very few people do, and it's the kind of knowledge not easily distilled for the masses in an easily digestible in a popsci way.

Look into how older CNNs work -- we have very good visual/accesible/popsci materials on how they work.

I'm sure we'll have that for LLM but it's not worth it to the people who can produce that kind of material to produce it now when the field is moving so rapidly, those people's time is much better used in improving the LLMs.

The kind of progress being made leads me to believe there absolutely ARE people who absolutely know how the LLMs work and they're not just a bunch of monkeys randomly throwing things at GPUs and seeing what sticks.

replies(2): >>42961916 #>>42965302 #

2. gessha ◴[06 Feb 25 13:03 UTC] No.42961916[source]▶

>>42960342 (TP) #

As a person who has trained a number of computer vision deep networks, I can tell you that we have some cool-looking visualizations on how lower layers work but no idea how later layers work. The intuition is built over training numerous networks and trying different hyperparameters, data shuffling, activations, etc. it’s absolutely brutal over here. If the theory was there, people like Karpathy who have great teacher vibes would’ve explained it for the mortal grad students or enthusiast tinkerers.

> The kind of progress being made leads me to believe there absolutely ARE people who absolutely know how the LLMs work and they're not just a bunch of monkeys randomly throwing things at GPUs and seeing what sticks

I say this less as an authoritative voice but more as an amused insider: Spend a week with some ML grad students and you will get a chuckle whenever somebody says we’re not some monkeys throwing things at GPUs.

replies(1): >>42962093 #

3. bloomingkales ◴[06 Feb 25 13:29 UTC] No.42962093[source]▶

>>42961916 #

It may be as simple as this:

https://youtube.com/shorts/7GrecDNcfMc

Many many layers of that. It’s not a profound mechanism. We can understand how that works, but we’re dumbfounded how such a small mechanism is responsible for all this stuff going on inside a brain.

I don’t think we don’t understand, it’s a level beyond that. We can’t fathom the implications, that it could be that simple, just scaled up.

replies(1): >>42965342 #

4. ClumsyPilot ◴[06 Feb 25 18:51 UTC] No.42965302[source]▶

>>42960342 (TP) #

> The kind of progress being made leads me to believe there absolutely ARE people who absolutely know how the LLMs work

Just like alchemists made enormous strides in chemistry, but their goal was to turn piss into gold.

5. ClumsyPilot ◴[06 Feb 25 18:56 UTC] No.42965342{3}[source]▶

>>42962093 #

> Many many layers of that. It’s not a profound mechanism

Bad argument. Cavemen understood stone, but they could not build the aqueducts. Medieval people understood iron, water and fire but they could not make a steam engine

Finally we understand protons, electrons, and neutrons and the forces that government them but it does not mean we understand everything they could mossibly make

replies(1): >>42965612 #

6. bloomingkales ◴[06 Feb 25 19:26 UTC] No.42965612{4}[source]▶

>>42965342 #

"Cavemen understood stone"

How far removed are you from a caveman is the better question. There would be quite some arrogance coming out of you to suggest the several million years gap is anything but an instant in the grand timeline. As in, you understood stone just yesterday ...

The monkey that found the stone is the monkey that built the cathedral. It's only a delusion the second monkey creates to separate it from the first monkey (a feeling of superiority, with the only tangible asset being "a certain amount of notable time passed since point A and point B").

"Finally we understand protons, electrons, and neutrons and the forces that government them but it does not mean we understand everything they could mossibly make"

You and I agree. That those simple things can truly create infinite possibilities. That's all I was saying, we cannot fathom it (either because infinity is hard to fathom, or that it's origins are humble - just a few core elements, or both, or something else).

Anyway, this can discussion can head into any direction.

↑