We went from chatgpt's "oh, look, it looks like python code but everything is wrong" to "here's a full stack boilerplate app that does what you asked and works in 0-shot" inside 2 years. That's the kicker. And the sauce isn't just in the training set, models now do post-training and RL and a bunch of other stuff to get to where we are. Not to mention the insane abilities with extended context (first models were 2/4k max), agentic stuff, and so on.
These kinds of comments are really missing the point.
Showing off moderately complicated results that are actually not indicative of performance because they are sniped by the training data turns this from a cool demo to a parlor trick.
Stating that, aha, jokes on you, that's the status quo, is an even bigger indictment.