Reinforcement learning really helped Transformer based LLMs evolve in terms of quality and reasoning which we saw as DeepSeek was launched. I am curious if what this is is equivalent to an early GPT 4o that has not yet reaped the benefits of add-on technologies that helped improve the quality?