←back to thread

514 points mfiguiere | 7 comments | | HN request time: 0.397s | source | bottom
Show context
gklitt ◴[] No.43710093[source]
I tried one task head-to-head with Codex o4-mini vs Claude Code: writing documentation for a tricky area of a medium-sized codebase.

Claude Code did great and wrote pretty decent docs.

Codex didn't do well. It hallucinated a bunch of stuff that wasn't in the code, and completely misrepresented the architecture - it started talking about server backends and REST APIs in an app that doesn't have any of that.

I'm curious what went so wrong - feels like possibly an issue with loading in the right context and attending to it correctly? That seems like an area that Claude Code has really optimized for.

I have high hopes for o3 and o4-mini as models so I hope that other tests show better results! Also curious to see how Cursor etc. incorporate o3.

replies(7): >>43710162 #>>43710290 #>>43711286 #>>43713258 #>>43714390 #>>43714966 #>>43716635 #
strangescript ◴[] No.43711286[source]
Claude Code still feels superior. o4-mini has all sorts of issues. o3 is better but at that point, you aren't saving money so who cares.

I feel like people are sleeping on Claude Code for one reason or another. Its not cheap, but its by far the best, most consistent experience I have had.

replies(3): >>43711411 #>>43711764 #>>43712470 #
artdigital ◴[] No.43712470[source]
Claude Code is just way too expensive.

These days I’m using Amazon Q Pro on the CLI. Very similar experience to Claude Code minus a few batteries. But it’s capped at $20/mo and won’t set my credit card on fire.

replies(2): >>43713490 #>>43714291 #
1. aitchnyu ◴[] No.43713490[source]
Is it using one of these models? https://openrouter.ai/models?q=amazon

Seems 4x costlier than my Aider+Openrouter. Since I'm less about vibes or huge refactoring, my (first and only) bill is <5 usd with Gemini. These models will halve that.

replies(1): >>43715394 #
2. artdigital ◴[] No.43715394[source]
No, Amazon Q is using Amazon Q. You can't change the model, it's calling itself "Q" and it's capped to $20 (Q Developer Pro plan). There is also a free tier available - https://aws.amazon.com/q/developer/

It's very much a "Claude Code" in the sense that you have a "q chat" command line command that can do everything from changing files, running shell commands, reading and researching, etc. So I can say "q chat" and then tell it "read this repo and create a README" or whatever else Claude Code can do. It does everything by itself in an agentic way. (I didn't want to say like 'Aider' because the entire appeal of Claude Code is that it does everything itself, like figuring out what files to read/change)

(It's calling itself Q but from my testing it's pretty clear that it's a variant of Claude hosted through AWS which makes sense considering how much money Amazon pumped into Anthropic)

replies(2): >>43715475 #>>43718526 #
3. aitchnyu ◴[] No.43715475[source]
I felt Sonnet 3.7 would cost at least $30 a month for light use. Did they figure out a way to offer it cheaper?
replies(1): >>43715820 #
4. nmcfarl ◴[] No.43715820{3}[source]
I don’t know what Amazon did - but I use Aider+Openrouter with Gemini 2.5 pro and it cost 1/6 of what sonnet 3.7 does. The aider leaderboard https://aider.chat/docs/leaderboards/ - includes relative pricing theses days.
5. dingnuts ◴[] No.43718526[source]
> the entire appeal of Claude Code is that it does everything itself, like figuring out what files to read/change

how is this appealing? I think I must be getting old because the idea of letting a language model run wild and run commands on my system -- that's unsanitized input! --horrifies me! What do you mean just let it change random files??

I'm going to have to learn a new trade, IDK

replies(2): >>43720584 #>>43725574 #
6. winrid ◴[] No.43720584{3}[source]
It shows you the diff and you confirm it, asks you before running commands, and doesn't allow accessing files outside the current dir. You can also tell it to not ask again and let it go wild, I've built full features this way and then just go through and clean it up a bit after.
7. hmottestad ◴[] No.43725574{3}[source]
In the OpenAI demo of codex they said that it’s sandboxed.

It only has access to files within the directory it’s run from, even if it calls tools that could theoretically access files anywhere on your system. Also had networking blocked, also in a sandboxes fashion so that things like curl don’t work either.

I wasn’t particularly impressed with my short test of Codex yesterday. Just the fact that it managed to make any decent changes at all was good, but when it messed up the code it took a long time and a lot of tokens to figure out.

I think we need fine tuned models that are good at different tasks. A specific fine tune for fixing syntax errors in Java would be a good start.

In general it also needs to be more proactive in writing and running tests.