The unreasonable effectiveness of an LLM agent loop with tool use

(sketch.dev)

447 points crawshaw | 2 comments | 15 May 25 19:33 UTC | HN request time: 0.449s | source

Show context

kgeist ◴[15 May 25 20:28 UTC] No.43998994[source]▶

Today I tried "vibe-coding" for the first time using GPT-4o and 4.1. I did it manually - just feeding compilation errors, warnings, and suggestions in a loop via the canvas interface. The file was small, around 150 lines.

It didn't go well. I started with 4o:

- It used a deprecated package.

- After I pointed that out, it didn't update all usages - so I had to fix them manually.

- When I suggested a small logic change, it completely broke the syntax (we're talking "foo() } return )))" kind of broken) and never recovered. I gave it the raw compilation errors over and over again, but it didn't even register the syntax was off - just rewrote random parts of the code instead.

- Then I thought, "maybe 4.1 will be better at coding" (as advertized). But 4.1 refused to use the canvas at all. It just explained what I could change - as in, you go make the edits.

- After some pushing, I got it to use the canvas and return the full code. Except it didn't - it gave me a truncated version of the code with comments like "// omitted for brevity".

That's when I gave up.

Do agents somehow fix this? Because as it stands, the experience feels completely broken. I can't imagine giving this access to bash, sounds way too dangerous.

replies(31): >>43999028 #>>43999055 #>>43999097 #>>43999162 #>>43999169 #>>43999248 #>>43999263 #>>43999272 #>>43999296 #>>43999300 #>>43999358 #>>43999373 #>>43999390 #>>43999401 #>>43999402 #>>43999497 #>>43999556 #>>43999610 #>>43999916 #>>44000527 #>>44000695 #>>44001136 #>>44001181 #>>44001568 #>>44001697 #>>44002185 #>>44002837 #>>44003198 #>>44003824 #>>44008480 #>>44048487 #

abiraja ◴[15 May 25 20:44 UTC] No.43999162[source]▶

>>43998994 #

GPT4o and 4.1 are definitely not the best models to use here. Use Claude 3.5/3.7, Gemini Pro 2.5 or o3. All of them work really well for small files.

replies(1): >>44005779 #

1. linsomniac ◴[16 May 25 14:13 UTC] No.44005779[source]▶

>>43999162 #

What are people using to interface with Gemini Pro 2.5? I'm using Claude Code with Claude Sonnet 3.7, and Codex with OpenAI, but Codex with Gemini didn't seem to work very well last week, kept telling me to go make this or that change in the code rather than doing it itself.

replies(1): >>44019962 #

2. tinodb ◴[18 May 25 08:54 UTC] No.44019962[source]▶

>>44005779 (TP) #

I use Gemini Pro 2.5 from Zed sometimes. But whilst it is good at higher level architecture on a lot of context, it is quite bad at 1) generating the correct diffs that Zed can apply and 2) at continuing. It just doesn’t seem to get “tool usage”.

↑