←back to thread

435 points crawshaw | 5 comments | | HN request time: 0.857s | source
Show context
kgeist ◴[] No.43998994[source]
Today I tried "vibe-coding" for the first time using GPT-4o and 4.1. I did it manually - just feeding compilation errors, warnings, and suggestions in a loop via the canvas interface. The file was small, around 150 lines.

It didn't go well. I started with 4o:

- It used a deprecated package.

- After I pointed that out, it didn't update all usages - so I had to fix them manually.

- When I suggested a small logic change, it completely broke the syntax (we're talking "foo() } return )))" kind of broken) and never recovered. I gave it the raw compilation errors over and over again, but it didn't even register the syntax was off - just rewrote random parts of the code instead.

- Then I thought, "maybe 4.1 will be better at coding" (as advertized). But 4.1 refused to use the canvas at all. It just explained what I could change - as in, you go make the edits.

- After some pushing, I got it to use the canvas and return the full code. Except it didn't - it gave me a truncated version of the code with comments like "// omitted for brevity".

That's when I gave up.

Do agents somehow fix this? Because as it stands, the experience feels completely broken. I can't imagine giving this access to bash, sounds way too dangerous.

replies(30): >>43999028 #>>43999055 #>>43999097 #>>43999162 #>>43999169 #>>43999248 #>>43999263 #>>43999272 #>>43999296 #>>43999300 #>>43999358 #>>43999373 #>>43999390 #>>43999401 #>>43999402 #>>43999497 #>>43999556 #>>43999610 #>>43999916 #>>44000527 #>>44000695 #>>44001136 #>>44001181 #>>44001568 #>>44001697 #>>44002185 #>>44002837 #>>44003198 #>>44003824 #>>44008480 #
1. codethief ◴[] No.43999497[source]
The other day I used the Cline plugin for VSCode with Claude to create an Android app prototype from "scratch", i.e. starting from the usual template given to you by Android Studio. It produced several thousand lines of code, there was not a single compilation error, and the app ended up doing exactly what I wanted – modulo a bug or two, which were caused not by the LLM's stupidity but by weird undocumented behavior of the rather arcane Android API in question. (Which is exactly why I wanted a quick prototype.)

After pointing out the bugs to the LLM, it successfully debugged them (with my help/feedback, i.e. I provided the output of the debug messages it had added to the code) and ultimately fixed them. The only downside was that I wasn't quite happy with the quality of the fixes – they were more like dirty hacks –, but oh well, after another round or two of feedback we got there, too. I'm sure one could solve that more generally, by putting the agent writing the code in a loop with some other code reviewing agent.

replies(2): >>44000746 #>>44002822 #
2. cheema33 ◴[] No.44000746[source]
> I'm sure one could solve that more generally, by putting the agent writing the code in a loop with some other code reviewing agent.

This x 100. I get so much better quality code if I have LLMs review each other's code and apply corrections. It is ridiculously effective.

replies(1): >>44000957 #
3. lftl ◴[] No.44000957[source]
Can you elaborate a little more on your setup? Are you manually copyong and pasting code from one LLM to another, or do you have some automated workflow for this?
replies(1): >>44006655 #
4. suddenlybananas ◴[] No.44002822[source]
What was the app? It could plausibly be something that has an open source equivalent already in the training data.
5. htsh ◴[] No.44006655{3}[source]
I have been doing this with claude code and openai codex and/or cline. One of the three takes the first pass (usually claude code, sometimes codex), then I will have cline / gemini 2.5 do a "code review" and offer suggestions for fixes before it applies them.