The unreasonable effectiveness of an LLM agent loop with tool use

(sketch.dev)

447 points crawshaw | 2 comments | 15 May 25 19:33 UTC | HN request time: 0.778s | source

Show context

kgeist ◴[15 May 25 20:28 UTC] No.43998994[source]▶

Today I tried "vibe-coding" for the first time using GPT-4o and 4.1. I did it manually - just feeding compilation errors, warnings, and suggestions in a loop via the canvas interface. The file was small, around 150 lines.

It didn't go well. I started with 4o:

- It used a deprecated package.

- After I pointed that out, it didn't update all usages - so I had to fix them manually.

- When I suggested a small logic change, it completely broke the syntax (we're talking "foo() } return )))" kind of broken) and never recovered. I gave it the raw compilation errors over and over again, but it didn't even register the syntax was off - just rewrote random parts of the code instead.

- Then I thought, "maybe 4.1 will be better at coding" (as advertized). But 4.1 refused to use the canvas at all. It just explained what I could change - as in, you go make the edits.

- After some pushing, I got it to use the canvas and return the full code. Except it didn't - it gave me a truncated version of the code with comments like "// omitted for brevity".

That's when I gave up.

Do agents somehow fix this? Because as it stands, the experience feels completely broken. I can't imagine giving this access to bash, sounds way too dangerous.

replies(31): >>43999028 #>>43999055 #>>43999097 #>>43999162 #>>43999169 #>>43999248 #>>43999263 #>>43999272 #>>43999296 #>>43999300 #>>43999358 #>>43999373 #>>43999390 #>>43999401 #>>43999402 #>>43999497 #>>43999556 #>>43999610 #>>43999916 #>>44000527 #>>44000695 #>>44001136 #>>44001181 #>>44001568 #>>44001697 #>>44002185 #>>44002837 #>>44003198 #>>44003824 #>>44008480 #>>44048487 #

ebiester ◴[15 May 25 21:09 UTC] No.43999373[source]▶

>>43998994 #

I get that it's frustrating to be told "skill issue," but using an LLM is absolutely a skill and there's a combination of understanding the strengths of various tools, experimenting with them to understand the techniques, and just pure practice.

I think if I were giving access to bash, though, it would definitely be in a docker container for me as well.

replies(2): >>44000420 #>>44001070 #

wtetzner ◴[15 May 25 23:36 UTC] No.44000420[source]▶

>>43999373 #

Sure, you can probably get better at it, but is it really worth the effort over just getting better at programming?

replies(5): >>44000470 #>>44000537 #>>44000733 #>>44004139 #>>44005306 #

cheema33 ◴[16 May 25 00:34 UTC] No.44000733[source]▶

>>44000420 #

If you are going to race a fighter jet, and you are on a bicycle, exercising more and eating right will not help. You have to use a better tool.

A good programmer with AI tools will run circles around a good programmer without AI tools.

replies(4): >>44000916 #>>44001217 #>>44001778 #>>44001903 #

jsight ◴[16 May 25 04:42 UTC] No.44001903[source]▶

>>44000733 #

To be fair, that's also what a lot of us used to say about IDEs. In reality, plenty of folks just turned vim into a fighter jet and did just as well without super-heavyweight llms.

I'm not totally convinced that we won't see a similar effect here, with some really competitive coders 100% eschewing LLMs and still doing as well as the best that use them.

replies(1): >>44002501 #

1. TeMPOraL ◴[16 May 25 06:55 UTC] No.44002501[source]▶

>>44001903 #

> In reality, plenty of folks just turned vim into a fighter jet and did just as well without super-heavyweight llms.

No, they didn't.

You can get vim and Emacs on par with IDEs[0] somewhat easily thanks to Language Server Protocol. You can't turn them into "fighter jets" without "super-heavyweight LLMs" because that's literally what, per GP, makes an editor/IDE a fighter jet. Yes, Emacs has packages for LLM integration, and presumably so does Vim, but the whole "fighter jet vs. bicycle" is entirely about SOTA LLMs being involved or not.

[0] - On par wrt. project-level features IDEs excel at; both editors of course have other aspects that none of the IDEs ever come close to.

replies(1): >>44009906 #

2. jsight ◴[16 May 25 21:21 UTC] No.44009906[source]▶

>>44002501 (TP) #

Honestly, that is a really fair counterpoint. I've been playing with neovim lately and it really feels a lot like some of the earlier IDEs that I used to use but with more modern power and tremendous speed.

Maybe we will all use LLMs one day in neovim too. :)

↑