The unreasonable effectiveness of an LLM agent loop with tool use

Today I tried "vibe-coding" for the first time using GPT-4o and 4.1. I did it manually - just feeding compilation errors, warnings, and suggestions in a loop via the canvas interface. The file was small, around 150 lines.

It didn't go well. I started with 4o:

- It used a deprecated package.

- After I pointed that out, it didn't update all usages - so I had to fix them manually.

- When I suggested a small logic change, it completely broke the syntax (we're talking "foo() } return )))" kind of broken) and never recovered. I gave it the raw compilation errors over and over again, but it didn't even register the syntax was off - just rewrote random parts of the code instead.

- Then I thought, "maybe 4.1 will be better at coding" (as advertized). But 4.1 refused to use the canvas at all. It just explained what I could change - as in, you go make the edits.

- After some pushing, I got it to use the canvas and return the full code. Except it didn't - it gave me a truncated version of the code with comments like "// omitted for brevity".

That's when I gave up.

Do agents somehow fix this? Because as it stands, the experience feels completely broken. I can't imagine giving this access to bash, sounds way too dangerous.

Ive been doing this exactly "manual" setup (actually after a while I jsut wrote a few browser drivers so it is much more snappier but I am getting ahead).

I started with GPT which gave mediocre results, then switched to claude which was a step function improvement - but again grinded when complexity got a bit high. Main problem was after a certain size it did not give good ways break down your projects.

Then I switched to Gemini. This has blown my mind away. I dont even use cursor etc. Just plain old simple prompts and summarization and regular refactoring and it handles itself pretty well. I must have generated 30M tokens so far (in about 3 weeks) and less 1% of "backtracking" needed. i define backtracking as your context has gone so wonky that you have to start all over again.