←back to thread

435 points crawshaw | 2 comments | | HN request time: 0.001s | source
Show context
kgeist ◴[] No.43998994[source]
Today I tried "vibe-coding" for the first time using GPT-4o and 4.1. I did it manually - just feeding compilation errors, warnings, and suggestions in a loop via the canvas interface. The file was small, around 150 lines.

It didn't go well. I started with 4o:

- It used a deprecated package.

- After I pointed that out, it didn't update all usages - so I had to fix them manually.

- When I suggested a small logic change, it completely broke the syntax (we're talking "foo() } return )))" kind of broken) and never recovered. I gave it the raw compilation errors over and over again, but it didn't even register the syntax was off - just rewrote random parts of the code instead.

- Then I thought, "maybe 4.1 will be better at coding" (as advertized). But 4.1 refused to use the canvas at all. It just explained what I could change - as in, you go make the edits.

- After some pushing, I got it to use the canvas and return the full code. Except it didn't - it gave me a truncated version of the code with comments like "// omitted for brevity".

That's when I gave up.

Do agents somehow fix this? Because as it stands, the experience feels completely broken. I can't imagine giving this access to bash, sounds way too dangerous.

replies(30): >>43999028 #>>43999055 #>>43999097 #>>43999162 #>>43999169 #>>43999248 #>>43999263 #>>43999272 #>>43999296 #>>43999300 #>>43999358 #>>43999373 #>>43999390 #>>43999401 #>>43999402 #>>43999497 #>>43999556 #>>43999610 #>>43999916 #>>44000527 #>>44000695 #>>44001136 #>>44001181 #>>44001568 #>>44001697 #>>44002185 #>>44002837 #>>44003198 #>>44003824 #>>44008480 #
nico ◴[] No.43999248[source]
4o and 4.1 are not very good at coding

My best results are usually with 4o-mini-high, o3 is sometimes pretty good

I personally don’t like the canvas. I prefer the output on the chat

And a lot of times I say: provide full code for this file, or provide drop-in replacement (when I don’t want to deal with all the diffs). But usually at around 300-400 lines of code, it starts getting bad and then I need to refactor to break stuff up into multiple files (unless I can focus on just one method inside a file)

replies(2): >>43999569 #>>43999689 #
manmal ◴[] No.43999569[source]
o3 is shockingly good actually. I can’t use it often due to rate limiting, so I save it for the odd occasion. Today I asked it how I could integrate a tree of Swift binary packages within an SDK, and detect internal version clashes, and it gave a very well researched and sensible overview. And gave me a new idea that I‘ll try.
replies(2): >>44000022 #>>44000083 #
1. kenjackson ◴[] No.44000022[source]
I use o3 for anything math or coding related. 4o is good for things like, "my knee hurts when I do this and that -- what might it be?"
replies(1): >>44002814 #
2. TeMPOraL ◴[] No.44002814[source]
In ChatGPT, at this point I use 4o pretty much only for image generation; it's the one feature that's unique to it and is mind-blowingly good. For everything else, I default to o3.

For coding, I stick to Claude 3.5 / 3.7 and recently Gemini 2.5 Pro. I sometimes use o3 in ChatGPT when I can't be arsed to fire up Aider, or really need to use its search features to figure out how to do something (e.g. pinouts for some old TFT screens for ESP32 and Raspberry Pi, most recently).