Recent AI model progress feels mostly like bullshit

I agree, about both the issue with benchmarks not being relevant to actual use cases and the "wants to sound smart" issue. I have seen them both first hand interacting with llms.

I think the ability to embed arbitrary knowledge written in arbitrary formats is the most important thing llms have achieved.

In my experience trying to get an llm to perform a task as vast and open ended as the one the author describes is fundamentally misguided. The llms were not trained for that and won't be able to do it in a satisfactory degree. But all this research has thankfully provided us with the software and hardware tools where one could start working on training a model that can.

Contrast that to 5-6 years ago, when all you could hope for this kind of thing was simple rule based and pattern matching systems.