←back to thread

435 points crawshaw | 9 comments | | HN request time: 0.337s | source | bottom
Show context
kgeist ◴[] No.43998994[source]
Today I tried "vibe-coding" for the first time using GPT-4o and 4.1. I did it manually - just feeding compilation errors, warnings, and suggestions in a loop via the canvas interface. The file was small, around 150 lines.

It didn't go well. I started with 4o:

- It used a deprecated package.

- After I pointed that out, it didn't update all usages - so I had to fix them manually.

- When I suggested a small logic change, it completely broke the syntax (we're talking "foo() } return )))" kind of broken) and never recovered. I gave it the raw compilation errors over and over again, but it didn't even register the syntax was off - just rewrote random parts of the code instead.

- Then I thought, "maybe 4.1 will be better at coding" (as advertized). But 4.1 refused to use the canvas at all. It just explained what I could change - as in, you go make the edits.

- After some pushing, I got it to use the canvas and return the full code. Except it didn't - it gave me a truncated version of the code with comments like "// omitted for brevity".

That's when I gave up.

Do agents somehow fix this? Because as it stands, the experience feels completely broken. I can't imagine giving this access to bash, sounds way too dangerous.

replies(30): >>43999028 #>>43999055 #>>43999097 #>>43999162 #>>43999169 #>>43999248 #>>43999263 #>>43999272 #>>43999296 #>>43999300 #>>43999358 #>>43999373 #>>43999390 #>>43999401 #>>43999402 #>>43999497 #>>43999556 #>>43999610 #>>43999916 #>>44000527 #>>44000695 #>>44001136 #>>44001181 #>>44001568 #>>44001697 #>>44002185 #>>44002837 #>>44003198 #>>44003824 #>>44008480 #
1. theropost ◴[] No.43999169[source]
150 lines? I find can quickly scale to around 1500 lines, and then start more precision on the classes, and functions I am looking to modify
replies(1): >>43999409 #
2. jokethrowaway ◴[] No.43999409[source]
It's completely broken for me over 400 lines (Claude 3.7, paid Cursor)

The worst is when I ask something complex, the model generates 300 lines of good code and then timeouts or crashes. If I ask to continue it will mess up the code for good, eg. starts generating duplicated code or functions which don't match the rest of the code.

replies(4): >>43999532 #>>43999633 #>>43999652 #>>43999872 #
3. johnsmith1840 ◴[] No.43999532[source]
It's a new skill that takes time to learn. When I started on gpt3.5 it took me easily 6 months of daily use before I was making real progress with it.

I regularly generate and run in the 600-1000LOC range.

Not sure you would call it "vibe coding" though as the details and info you provide it and how you provide it is not simple.

I'd say realistically it speeds me up 10x on fresh greenfield projects and maybe 2x on mature systems.

You should be reading the code coming out. The real way to prevent errors is read the resoning and logic. The moment you see a mistep go back and try the prompt again. If that fails try a new session entirely.

Test time compute models like o1-pro or the older o1-preview are massively better at not putting errors in your code.

Not sure about the new claude method but true, slow test time models are MASSIVELY better at coding.

replies(1): >>44001775 #
4. fragmede ◴[] No.43999633[source]
what language?
5. koakuma-chan ◴[] No.43999652[source]
Sounds like a Cursor issue
6. tqwhite ◴[] No.43999872[source]
Definitely a new skill to learn. Everyone I know that is having problems is just telling it what to do, not coaching it. It is not an automaton... instructions in code out. Treat it like a team member that will do the work if you teach it right and you will have much more success.

But is definitely a learning process for you.

7. derwiki ◴[] No.44001775{3}[source]
The “go back and try the prompt again” is the workflow I’d like to see a UX improvement on. Outside of the vibe coding “accept all” path, reverse traversing is a fairly manual process.
replies(2): >>44002228 #>>44017208 #
8. baq ◴[] No.44002228{4}[source]
Cursor has checkpoints for this but I feel I’ve never used them properly; easier to reject all and reprint. I keep chats short.
9. johnsmith1840 ◴[] No.44017208{4}[source]
I don't think you will realistically.

Having full control over inputs and if something goes wrong starting a new chat with either narrower scope or clearer instructions is basically AGI level work.

There is nobody but a human for now that can determine how bad an LLM actually screwed up its logic train.

But maybe you mean pure UI?

I could forsee something like a new context creation button that gives a nice UI of what to bring over and what to ditch from the UI as pretty nice.

Maybe like a git diff looking method? Drop this paragraph bring this function by just simple clicks would be pretty slick!

I deffinetly see a future of better cross chat context connections and information being powerful. Basically git but for every conversation and cpde generated for a project.

Would be crazy hard but also crazy powerful.

If my startups blows up I might try something like that!