The unreasonable effectiveness of an LLM agent loop with tool use

(sketch.dev)

447 points crawshaw | 3 comments | 15 May 25 19:33 UTC | HN request time: 0.716s | source

Show context

kgeist ◴[15 May 25 20:28 UTC] No.43998994[source]▶

Today I tried "vibe-coding" for the first time using GPT-4o and 4.1. I did it manually - just feeding compilation errors, warnings, and suggestions in a loop via the canvas interface. The file was small, around 150 lines.

It didn't go well. I started with 4o:

- It used a deprecated package.

- After I pointed that out, it didn't update all usages - so I had to fix them manually.

- When I suggested a small logic change, it completely broke the syntax (we're talking "foo() } return )))" kind of broken) and never recovered. I gave it the raw compilation errors over and over again, but it didn't even register the syntax was off - just rewrote random parts of the code instead.

- Then I thought, "maybe 4.1 will be better at coding" (as advertized). But 4.1 refused to use the canvas at all. It just explained what I could change - as in, you go make the edits.

- After some pushing, I got it to use the canvas and return the full code. Except it didn't - it gave me a truncated version of the code with comments like "// omitted for brevity".

That's when I gave up.

Do agents somehow fix this? Because as it stands, the experience feels completely broken. I can't imagine giving this access to bash, sounds way too dangerous.

replies(31): >>43999028 #>>43999055 #>>43999097 #>>43999162 #>>43999169 #>>43999248 #>>43999263 #>>43999272 #>>43999296 #>>43999300 #>>43999358 #>>43999373 #>>43999390 #>>43999401 #>>43999402 #>>43999497 #>>43999556 #>>43999610 #>>43999916 #>>44000527 #>>44000695 #>>44001136 #>>44001181 #>>44001568 #>>44001697 #>>44002185 #>>44002837 #>>44003198 #>>44003824 #>>44008480 #>>44048487 #

danbmil99 ◴[15 May 25 22:18 UTC] No.43999916[source]▶

>>43998994 #

As others have noted, you sound about 3 months behind the leading edge. What you describe is like my experience from February.

Switch to Claude (IMSHO, I think Gemini is considered on par). Use a proper coding tool, cutting & pasting from the chat window is so last week.

replies(1): >>44000927 #

candiddevmike ◴[16 May 25 01:13 UTC] No.44000927[source]▶

>>43999916 #

Instead of churning on frontend frameworks while procrastinating about building things we've moved onto churning dev setups for micro gains.

replies(2): >>44001511 #>>44001995 #

mycall ◴[16 May 25 03:15 UTC] No.44001511[source]▶

>>44000927 #

> churning dev setups for micro gains.

Devs have been doing micro changes to their setup for 50 years. It is the nature of their beast.

replies(1): >>44001537 #

1. zahlman ◴[16 May 25 03:21 UTC] No.44001537[source]▶

>>44001511 #

Where do people on HN meet these devs who are willing to do this sort of thing, and get anxious about being 3 months behind the latest and greatest?

In my world, they were given 9 years to switch to Python 3 even if you write off 3.0 and 3.1 as premature, and they still missed by years, and loudly complained afterwards.

And they still can't be bothered to learn what a `pyproject.toml` is, let alone actually use it for its intended purpose. One of the most popular third-party Python libraries (Requests), which is under stewardship by the PSF, which uses only Python code, had its "build" (no compilation - purely a matter of writing metadata, shuffling some files around and zipping it up) broken by the removal of years-old functionality in Setuptools that they weren't even actually remotely reliant upon. Twice, in the last year.

replies(1): >>44002387 #

2. guappa ◴[16 May 25 06:35 UTC] No.44002387[source]▶

>>44001537 (TP) #

You just need to be a frontend dev in a very overstaffed team (like where I work) and then you need to fill up your day doing that and creating a task per every couple of line changed, and require multiple approvals to merge anything.

It takes me ~1 week to merge small fixes to their build system (which they don't understand anyway so they just approve whatever).

replies(1): >>44103699 #

3. mycall ◴[27 May 25 03:24 UTC] No.44103699[source]▶

>>44002387 #

How much security patching does that workflow encounter?

↑