The unreasonable effectiveness of an LLM agent loop with tool use

(sketch.dev)

447 points crawshaw | 5 comments | 15 May 25 19:33 UTC | HN request time: 0.637s | source

Show context

kgeist ◴[15 May 25 20:28 UTC] No.43998994[source]▶

Today I tried "vibe-coding" for the first time using GPT-4o and 4.1. I did it manually - just feeding compilation errors, warnings, and suggestions in a loop via the canvas interface. The file was small, around 150 lines.

It didn't go well. I started with 4o:

- It used a deprecated package.

- After I pointed that out, it didn't update all usages - so I had to fix them manually.

- When I suggested a small logic change, it completely broke the syntax (we're talking "foo() } return )))" kind of broken) and never recovered. I gave it the raw compilation errors over and over again, but it didn't even register the syntax was off - just rewrote random parts of the code instead.

- Then I thought, "maybe 4.1 will be better at coding" (as advertized). But 4.1 refused to use the canvas at all. It just explained what I could change - as in, you go make the edits.

- After some pushing, I got it to use the canvas and return the full code. Except it didn't - it gave me a truncated version of the code with comments like "// omitted for brevity".

That's when I gave up.

Do agents somehow fix this? Because as it stands, the experience feels completely broken. I can't imagine giving this access to bash, sounds way too dangerous.

replies(31): >>43999028 #>>43999055 #>>43999097 #>>43999162 #>>43999169 #>>43999248 #>>43999263 #>>43999272 #>>43999296 #>>43999300 #>>43999358 #>>43999373 #>>43999390 #>>43999401 #>>43999402 #>>43999497 #>>43999556 #>>43999610 #>>43999916 #>>44000527 #>>44000695 #>>44001136 #>>44001181 #>>44001568 #>>44001697 #>>44002185 #>>44002837 #>>44003198 #>>44003824 #>>44008480 #>>44048487 #

fsndz ◴[15 May 25 21:11 UTC] No.43999390[source]▶

>>43998994 #

I can be frustrating at times. but my experience is the more you try the better you become at knowing what to ask and to expect. But I guess you understand now why some people say vibe coding is a bit overrated: https://www.lycee.ai/blog/why-vibe-coding-is-overrated

replies(1): >>43999880 #

the_af ◴[15 May 25 22:14 UTC] No.43999880[source]▶

>>43999390 #

"Overrated" is one way to call it.

Giving sharp knives to monkeys would be another.

replies(3): >>44002194 #>>44002197 #>>44002443 #

lnenad ◴[16 May 25 05:51 UTC] No.44002197[source]▶

>>43999880 #

Why do people keep thinking they're intellectually superior when negatively evaluating something that is OBVIOUSLY working for a very large percentage of people?

replies(4): >>44002398 #>>44002464 #>>44005289 #>>44008691 #

1. guappa ◴[16 May 25 06:36 UTC] No.44002398[source]▶

>>44002197 #

Because the large percentage of people is a few people doing hello words or things of similar difficulty.

Not every software developer is hired to do trivial frontend work.

replies(2): >>44003216 #>>44006810 #

2. FeepingCreature ◴[16 May 25 09:00 UTC] No.44003216[source]▶

>>44002398 (TP) #

The large percentage of software development is people doing hello world or similar difficulty. "CRUD apps," remember?

replies(1): >>44005304 #

3. the_af ◴[16 May 25 13:30 UTC] No.44005304[source]▶

>>44003216 #

Hopefully they are not live-coding that crap though. Do you want to make those apps even more unreliable than they already are, and encourage devs not to learn any lessons (as vibe coding prescribes)?

4. lnenad ◴[16 May 25 15:44 UTC] No.44006810[source]▶

>>44002398 (TP) #

Sure, you keep telling that to yourself.

replies(1): >>44026760 #

5. guappa ◴[19 May 25 05:43 UTC] No.44026760[source]▶

>>44006810 #

It seems to me that you're unable to understand that not everyone is a copy of you and they might have different life experiences.

This is a very important flaw that you should probably seek to correct.

↑