To me the biggest surprise was seeking grok dominate in all of their published benchmarks. I haven’t seen any benchmarks of it yet (which I take with a giant heap of salt), but it’s still interesting nevertheless.
I’m rooting for Anthropic.
replies(4):