Recent AI model progress feels mostly like bullshit

From my own experience on a codebase using a lot of custom algorithms on trees and sometimes graphs.

There were qualitatively leaps in my day-to-day usage:

Claude Sonnet 3.5 and ChatGPT O1 were good for writing slop and debugging simple bugs

Grok Thinking and Sonnet 3.7 were good to catch mildly complicated bugs and write functions with basic logic. They still made mistake

But recently, Gemini 2.5 pro has been scary good. I liked to made fun of the feel-the-AGI crowd but for the first time a model made me raise an eyebrow

It can one shot unusual function with complicated logic and subtle edge cases