A faster model that outperforms its slower version on multiple benchmarks? Can anyone explain why that makes sense? Are they simply retraining on the benchmark tests?
replies(4):
Can be anything from different arch, more data, RL, etc. It's probably RL. In recent months top tier labs seem to have "cracked" RL to a level not seen yet in open models, and by a large margin.