S1: A $6 R1 competitor?

(timkellogg.me)

851 points tkellogg | 2 comments | 05 Feb 25 11:05 UTC | HN request time: 0.497s | source

Show context

bloomingkales ◴[05 Feb 25 15:16 UTC] No.42949616[source]▶

If an LLM output is like a sculpture, then we have to sculpt it. I never did sculpting, but I do know they first get the clay spinning on a plate.

Whatever you want to call this “reasoning” step, ultimately it really is just throwing the model into a game loop. We want to interact with it on each tick (spin the clay), and sculpt every second until it looks right.

You will need to loop against an LLM to do just about anything and everything, forever - this is the default workflow.

Those who think we will quell our thirst for compute have another thing coming, we’re going to be insatiable with how much LLM brute force looping we will do.

replies(3): >>42955281 #>>42955806 #>>42956482 #

zoogeny ◴[05 Feb 25 21:55 UTC] No.42955806[source]▶

>>42949616 #

I can't believe this hasn't been done yet, perhaps it is a cost issue.

My literal first thought about AI was wondering why we couldn't just put it in a loop. Heck, one update per day, or one update per hour would even be a start. You have a running "context", the output is the next context (or a set of transformations on a context that is a bit larger than the output window). Then ramp that up ... one loop per minute, one per second, millisecond, microsecond.

replies(2): >>42955958 #>>42956117 #

int_19h ◴[05 Feb 25 22:20 UTC] No.42956117[source]▶

>>42955806 #

The hard part is coming up with a good way to grade results. Which you need to update the weights based on the outcome, otherwise the model will not actually learn anything.

replies(1): >>42956298 #

zoogeny ◴[05 Feb 25 22:32 UTC] No.42956298[source]▶

>>42956117 #

For the "looping" I'm talking about you don't need to update the weights. It is simply, old context in, new context out, new context in, new-new context out, etc.

Of course, keeping that coherent over numerous loops isn't going to be easy. No doubt there is a chance it goes off the rails. So you might have a section of context that is constantly stable, a section of context that updates each loop, etc.

In the other response to my comment someone mentioned eventually updating the weights (e.g. daily) and you would in that case have to have some kind of loss function.

replies(3): >>42957903 #>>42958106 #>>42958279 #

1. int_19h ◴[06 Feb 25 02:04 UTC] No.42958106[source]▶

>>42956298 #

Then I'm not quite sure what benefit you expect to derive from it? Making e.g. QwQ-32 loop isn't hard - it often does it all by itself, even. But it doesn't translate to improvements on every iteration; it just goes in circles.

replies(1): >>42962817 #

2. ◴[06 Feb 25 14:43 UTC] No.42962817[source]▶

>>42958106 (TP) #

↑