←back to thread

858 points cryptophreak | 1 comments | | HN request time: 0s | source
Show context
wiremine ◴[] No.42936346[source]
I'm going to take a contrarian view and say it's actually a good UI, but it's all about how you approach it.

I just finished a small project where I used o3-mini and o3-mini-high to generate most of the code. I averaged around 200 lines of code an hour, including the business logic and unit tests. Total was around 2200 lines. So, not a big project, but not a throw away script. The code was perfectly fine for what we needed. This is the third time I've done this, and each time I get faster and better at it.

1. I find a "pair programming" mentality is key. I focus on the high-level code, and let the model focus on the lower level code. I code review all the code, and provide feedback. Blindly accepting the code is a terrible approach.

2. Generating unit tests is critical. After I like the gist of some code, I ask for some smoke tests. Again, peer review the code and adjust as needed.

3. Be liberal with starting a new chat: the models can get easily confused with longer context windows. If you start to see things go sideways, start over.

4. Give it code examples. Don't prompt with English only.

FWIW, o3-mini was the best model I've seen so far; Sonnet 3.5 New is a close second.

replies(27): >>42936382 #>>42936605 #>>42936709 #>>42936731 #>>42936768 #>>42936787 #>>42936868 #>>42937019 #>>42937109 #>>42937172 #>>42937188 #>>42937209 #>>42937341 #>>42937346 #>>42937397 #>>42937402 #>>42937520 #>>42938042 #>>42938163 #>>42939246 #>>42940381 #>>42941403 #>>42942698 #>>42942765 #>>42946138 #>>42946146 #>>42947001 #
godelski ◴[] No.42937346[source]

  > I focus on the high-level code, and let the model focus on the lower level code.
Tbh the reason I don't use LLM assistants is because they suck at the "low level". They are okay at mid level and better at high level. I find it's actual coding very mediocre and fraught with errors.

I've yet to see any model understand nuance or detail.

This is especially apparent in image models. Sure, it can do hands but they still don't get 3D space nor temporal movements. It's great for scrolling through Twitter but the longer you look the more surreal they get. This even includes the new ByteDance model also on the front page. But with coding models they ignore context of the codebase and the results feel more like patchwork. They feel like what you'd be annoyed at with a junior dev for writing because not only do you have to go through 10 PRs to make it pass the test cases but the lack of context just builds a lot of tech debt. How they'll build unit tests that technically work but don't capture the actual issues and usually can be highly condensed while having greater coverage. It feels very gluey, like copy pasting from stack overflow when hyper focused on the immediate outcome instead of understanding the goal. It is too "solution" oriented, not understanding the underlying heuristics and is more frustrating than dealing with the human equivalent who says something "works" as evidenced by the output. This is like trying to say a math proof is correct by looking at just the last line.

Ironically, I think in part this is why chat interface sucks too. A lot of our job is to do a lot of inference in figuring out what our managers are even asking us to make. And you can't even know the answer until you're part way in.

replies(3): >>42937928 #>>42938207 #>>42938298 #
lucasmullens ◴[] No.42937928[source]
> But with coding models they ignore context of the codebase and the results feel more like patchwork.

Have you tried Cursor? It has a great feature that grabs context from the codebase, I use it all the time.

replies(3): >>42938048 #>>42939067 #>>42939425 #
pc86 ◴[] No.42938048[source]
I can't get the prompt because I'm on my work computer but I have about a three-quarter-page instruction set in the settings of cursor, it asks clarifying questions a LOT now, and is pretty liberal with adding in commented pseudo-code for stuff it isn't sure about. You can still trip it up if you try, but it's a lot better than stock. This is with Sonnet 3.5 agent chats (composer I think it's called?)

I actually cancelled by Anthropic subscription when I started using cursor because I only ever used Claude for code generation anyway so now I just do it within the IDE.

replies(2): >>42947006 #>>43012127 #
1. slig ◴[] No.42947006{3}[source]
I'm very interested in your prompt and could you be so kind to paste it somewhere and link in your comment, please?