We went from chatgpt's "oh, look, it looks like python code but everything is wrong" to "here's a full stack boilerplate app that does what you asked and works in 0-shot" inside 2 years. That's the kicker. And the sauce isn't just in the training set, models now do post-training and RL and a bunch of other stuff to get to where we are. Not to mention the insane abilities with extended context (first models were 2/4k max), agentic stuff, and so on.
These kinds of comments are really missing the point.
I disagree. In my experience, asking coding tools to produce something similar to all of the tutorials and example code out there works amazingly well.
Asking them to produce novel output that doesn’t match the training set produces very different results.
When I tried multiple coding agents for a somewhat unique task recently they all struggled, continuously trying to pull the solution back to the standard examples. It felt like an endless loop of the models grinding through a solution and then spitting out something that matched common examples, after which I had to remind them of the unique properties of the task and they started all over again, eventually arriving back in the same spot.
It shows the reality of working with LLMs and it’s an important consideration.