Recent AI model progress feels mostly like bullshit

> Since 3.5-sonnet, we have been monitoring AI model announcements, and trying pretty much every major new release that claims some sort of improvement. Unexpectedly by me, aside from a minor bump with 3.6 and an even smaller bump with 3.7, literally none of the new models we've tried have made a significant difference on either our internal benchmarks or in our developers' ability to find new bugs. This includes the new test-time OpenAI models.

This is likely a manifestation of the bitter lesson[1], specifically this part:

> The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project [like an incremental model update], massively more computation inevitably becomes available.

(Emphasis mine.)

Since the ultimate success strategy of the scruffies[2] or proponents of search and learning strategies in AI is Moore's Law, short term gains using these strategies will be miniscule. It is over at least a five year period that their gains will be felt the most. The neats win the day in the short term, but the hare in this race will ultimately give away to the steady plod of the tortoise.

1: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

2: https://en.m.wikipedia.org/wiki/Neats_and_scruffies#CITEREFM...