A Research Preview of Codex

Is anyone using any of these tools to write non boilerplate code?

I'm very interested.

In my experience ChatGPT and Gemini are absolutely terrible at these types of things. They are constantly wrong. I know I'm not saying anything new, but I'm waiting to personally experience an LLM that does something useful with any of the code I give it.

These tools aren't useless. They're great as search engines and pointing me in the right direction. They write dumb bash scripts that save me time here and there. That's it.

And it's hilarious to me how these people present these tools. It generates a bunch of code, and then you spend all your time auditing and fixing what is expected to be wrong.

That's not the type of code I'm putting in my company's code base, and I could probably write the damn code more correctly in less time than it takes to review for expected errors.

What am I missing?

Firstly, LLM chat interfaces != agentic coding platforms.

ChatGPT is good for asking questions about languages, SDKs, and APIs, or generating boilerplate, but it's useless if you want to give an AI a ticket and for it to raise PRs for you.

This is where you need agentic solutions like Codex which will be far more useful because they will actually have access to your codebase and a dev environment where they can test and debug changes.

They still do really dumb things, but a lot of this can be avoided if you prompt well and give it the right types of problems to solve.

In my experience at the moment there's a sweet spot with these agentic coding platforms which makes them useful for semi-complicated tasks – assuming you prompt well they can generate 90% of the code you need, then you just need to spend the extra 10% fixing it up before it's ready for prod.

Tasks too simple (a few lines) it's a waste of time. You spend longer prompting and going back and forth with the agent than it would take to just make the change yourself.

Then obviously very complicated tasks, especially tasks that require some thought around architecture and performance, coding agents really struggle with. Less because they can't do it, but because for certain problems simply meeting ACs is far less important than how the ACs are being met. Ideally here you want to get the architecture right first, then once that's in place you can break down the remaining work for the AI to pick up.