←back to thread

S1: A $6 R1 competitor?

(timkellogg.me)
851 points tkellogg | 2 comments | | HN request time: 0.497s | source
Show context
bloomingkales ◴[] No.42949616[source]
If an LLM output is like a sculpture, then we have to sculpt it. I never did sculpting, but I do know they first get the clay spinning on a plate.

Whatever you want to call this “reasoning” step, ultimately it really is just throwing the model into a game loop. We want to interact with it on each tick (spin the clay), and sculpt every second until it looks right.

You will need to loop against an LLM to do just about anything and everything, forever - this is the default workflow.

Those who think we will quell our thirst for compute have another thing coming, we’re going to be insatiable with how much LLM brute force looping we will do.

replies(3): >>42955281 #>>42955806 #>>42956482 #
zoogeny ◴[] No.42955806[source]
I can't believe this hasn't been done yet, perhaps it is a cost issue.

My literal first thought about AI was wondering why we couldn't just put it in a loop. Heck, one update per day, or one update per hour would even be a start. You have a running "context", the output is the next context (or a set of transformations on a context that is a bit larger than the output window). Then ramp that up ... one loop per minute, one per second, millisecond, microsecond.

replies(2): >>42955958 #>>42956117 #
int_19h ◴[] No.42956117[source]
The hard part is coming up with a good way to grade results. Which you need to update the weights based on the outcome, otherwise the model will not actually learn anything.
replies(1): >>42956298 #
zoogeny ◴[] No.42956298[source]
For the "looping" I'm talking about you don't need to update the weights. It is simply, old context in, new context out, new context in, new-new context out, etc.

Of course, keeping that coherent over numerous loops isn't going to be easy. No doubt there is a chance it goes off the rails. So you might have a section of context that is constantly stable, a section of context that updates each loop, etc.

In the other response to my comment someone mentioned eventually updating the weights (e.g. daily) and you would in that case have to have some kind of loss function.

replies(3): >>42957903 #>>42958106 #>>42958279 #
1. int_19h ◴[] No.42958106[source]
Then I'm not quite sure what benefit you expect to derive from it? Making e.g. QwQ-32 loop isn't hard - it often does it all by itself, even. But it doesn't translate to improvements on every iteration; it just goes in circles.
replies(1): >>42962817 #
2. ◴[] No.42962817[source]