Wow, crushing 2.5 Flash on every benchmark is huge. Time to move all of my LLM workloads to a local GPU rig.
Just remember to benchmark it yourself first with you private task collection, so you can actually measure them against each other. Pretty much any public benchmark is unreliable at this moment, and making model choices based on other's benchmarks is bound to leave you disappointed.