The unreasonable effectiveness of an LLM agent loop with tool use

(sketch.dev)

447 points crawshaw | 3 comments | 15 May 25 19:33 UTC | HN request time: 0.701s | source

Show context

kgeist ◴[15 May 25 20:28 UTC] No.43998994[source]▶

Today I tried "vibe-coding" for the first time using GPT-4o and 4.1. I did it manually - just feeding compilation errors, warnings, and suggestions in a loop via the canvas interface. The file was small, around 150 lines.

It didn't go well. I started with 4o:

- It used a deprecated package.

- After I pointed that out, it didn't update all usages - so I had to fix them manually.

- When I suggested a small logic change, it completely broke the syntax (we're talking "foo() } return )))" kind of broken) and never recovered. I gave it the raw compilation errors over and over again, but it didn't even register the syntax was off - just rewrote random parts of the code instead.

- Then I thought, "maybe 4.1 will be better at coding" (as advertized). But 4.1 refused to use the canvas at all. It just explained what I could change - as in, you go make the edits.

- After some pushing, I got it to use the canvas and return the full code. Except it didn't - it gave me a truncated version of the code with comments like "// omitted for brevity".

That's when I gave up.

Do agents somehow fix this? Because as it stands, the experience feels completely broken. I can't imagine giving this access to bash, sounds way too dangerous.

replies(31): >>43999028 #>>43999055 #>>43999097 #>>43999162 #>>43999169 #>>43999248 #>>43999263 #>>43999272 #>>43999296 #>>43999300 #>>43999358 #>>43999373 #>>43999390 #>>43999401 #>>43999402 #>>43999497 #>>43999556 #>>43999610 #>>43999916 #>>44000527 #>>44000695 #>>44001136 #>>44001181 #>>44001568 #>>44001697 #>>44002185 #>>44002837 #>>44003198 #>>44003824 #>>44008480 #>>44048487 #

danbmil99 ◴[15 May 25 22:18 UTC] No.43999916[source]▶

>>43998994 #

As others have noted, you sound about 3 months behind the leading edge. What you describe is like my experience from February.

Switch to Claude (IMSHO, I think Gemini is considered on par). Use a proper coding tool, cutting & pasting from the chat window is so last week.

replies(1): >>44000927 #

candiddevmike ◴[16 May 25 01:13 UTC] No.44000927[source]▶

>>43999916 #

Instead of churning on frontend frameworks while procrastinating about building things we've moved onto churning dev setups for micro gains.

replies(2): >>44001511 #>>44001995 #

1. latentsea ◴[16 May 25 05:05 UTC] No.44001995[source]▶

>>44000927 #

The amount of time spent churning on workflows and setups will offset the gains.

It's somewhat ironic the more behind the leading edge you are, the more efficient it is to make the gains eventually because you don't waste time on the micro-gain churn, and a bigger set of upgrades arrives when you get back on the leading edge.

I watched this dynamic play out so many times in the image generation space with people spending hundreds of hours crafting workflows to get around deficiencies in models, posting tutorials about it, other people spending all the time to learn those workflows. New model comes out and boom, all nullified and the churn started all over again. I eventually got sick of the churn. Batching the gains worked better.

replies(1): >>44002723 #

2. TeMPOraL ◴[16 May 25 07:38 UTC] No.44002723[source]▶

>>44001995 (TP) #

Missing in your description is that at least some of that work of "people spending hundreds of hours crafting workflows to get around deficiencies in models, posting tutorials about it, other people spending all the time to learn those workflows" is exactly what informed model developers about the major problems and what solutions seem most promising. All these workarounds are organically crowd-sourcing R&D, which is arguably one of the most impressive things about whole image generation space. The community around ComfyUI is pretty much a shapeless distributed research organization.

replies(1): >>44024933 #

3. latentsea ◴[18 May 25 22:58 UTC] No.44024933[source]▶

>>44002723 #

You're 100% right. I definitely witnessed and picked up on that too. Good callout.

↑