S1: A $6 R1 competitor?

For all the hype about thinking models, this feels much like compression in terms of information theory instead of a "takeoff" scenario.

There are a finite amount of information stored in any large model, the models are really good at presenting the correct information back, and adding thinking blocks made the models even better at doing that. But there is a cap to that.

Just like how you can compress a file by a lot, there is a theoretical maximum to the amount of compression before it starts becoming lossy. There is also a theoretical maximum of relevant information from a model regardless of how long it is forced to think.