←back to thread

S1: A $6 R1 competitor?

(timkellogg.me)
851 points tkellogg | 1 comments | | HN request time: 0.211s | source
Show context
Aperocky ◴[] No.42950592[source]
For all the hype about thinking models, this feels much like compression in terms of information theory instead of a "takeoff" scenario.

There are a finite amount of information stored in any large model, the models are really good at presenting the correct information back, and adding thinking blocks made the models even better at doing that. But there is a cap to that.

Just like how you can compress a file by a lot, there is a theoretical maximum to the amount of compression before it starts becoming lossy. There is also a theoretical maximum of relevant information from a model regardless of how long it is forced to think.

replies(3): >>42951063 #>>42956052 #>>42960773 #
1. zoogeny ◴[] No.42956052[source]
I think this is probably accurate and what remains to be seen is how "compressible" the larger models are.

The fact that we can compress a GPT-3 sized model into an o1 competitor is only the beginning. Maybe there is even more juice to squeeze there?

But even more, how much performance will we get out of o3 sized models? That is what is exciting since they are already performing near Phd levels on most evals.