The unreasonable effectiveness of an LLM agent loop with tool use

(sketch.dev)

447 points crawshaw | 1 comments | 15 May 25 19:33 UTC | HN request time: 0.332s | source

Show context

kgeist ◴[15 May 25 20:28 UTC] No.43998994[source]▶

Today I tried "vibe-coding" for the first time using GPT-4o and 4.1. I did it manually - just feeding compilation errors, warnings, and suggestions in a loop via the canvas interface. The file was small, around 150 lines.

It didn't go well. I started with 4o:

- It used a deprecated package.

- After I pointed that out, it didn't update all usages - so I had to fix them manually.

- When I suggested a small logic change, it completely broke the syntax (we're talking "foo() } return )))" kind of broken) and never recovered. I gave it the raw compilation errors over and over again, but it didn't even register the syntax was off - just rewrote random parts of the code instead.

- Then I thought, "maybe 4.1 will be better at coding" (as advertized). But 4.1 refused to use the canvas at all. It just explained what I could change - as in, you go make the edits.

- After some pushing, I got it to use the canvas and return the full code. Except it didn't - it gave me a truncated version of the code with comments like "// omitted for brevity".

That's when I gave up.

Do agents somehow fix this? Because as it stands, the experience feels completely broken. I can't imagine giving this access to bash, sounds way too dangerous.

replies(31): >>43999028 #>>43999055 #>>43999097 #>>43999162 #>>43999169 #>>43999248 #>>43999263 #>>43999272 #>>43999296 #>>43999300 #>>43999358 #>>43999373 #>>43999390 #>>43999401 #>>43999402 #>>43999497 #>>43999556 #>>43999610 #>>43999916 #>>44000527 #>>44000695 #>>44001136 #>>44001181 #>>44001568 #>>44001697 #>>44002185 #>>44002837 #>>44003198 #>>44003824 #>>44008480 #>>44048487 #

visarga ◴[15 May 25 20:39 UTC] No.43999097[source]▶

>>43998994 #

You should try Cursor or Windsurf, with Claude or Gemini model. Create a documentation file first. Generate tests for everything. The more the better. Then let it cycle 100 times until tests pass.

Normal programming is like walking, deliberate and sure. Vibe coding is like surfing, you can't control everything, just hit yes on auto. Trust the process, let it make mistakes and recover on its own.

replies(2): >>43999458 #>>43999859 #

1. tqwhite ◴[15 May 25 22:11 UTC] No.43999859[source]▶

>>43999097 #

I find that writing a thorough design spec is really worth it. Also, asking for its reaction. "What's missing?" "Should I do X or Y" does good things for its thought process, like engaging a younger programmer in the process.

Definitely, I ask for a plan and then, even if it's obvious, I ask questions and discuss it. I also point it as samples of code that I like with instructions for what is good about it.

Once we have settled on a plan, I ask it to break it into phases that can be tested (I am not one for a unit testing) to lock in progress. Claude LOVES that. It organizes a new plan and, at the end of each phase tells me how to test (curl, command line, whatever is appropriate) and what I should see that represents success.

The most important thing I have figured out is that Claude is a collaborator, not a minion. I agree with visarga, it's much more like surfing that walking. Also, Trust... but Verify.

This is a great time to be a programmer.

↑