I don't think intelligence is increasing. Arbitrary benchmarks don't reflect real world usage. Even with all the context it could possibly have, these models still miss/hallucinate things. Doesn't make them useless, but saying context is the bottleneck is incorrect.
replies(3):