OpenAI Codex CLI: Lightweight coding agent that runs in your terminal

(github.com)

Show context

gklitt ◴[16 Apr 25 20:37 UTC] No.43710093[source]▶

I tried one task head-to-head with Codex o4-mini vs Claude Code: writing documentation for a tricky area of a medium-sized codebase.

Claude Code did great and wrote pretty decent docs.

Codex didn't do well. It hallucinated a bunch of stuff that wasn't in the code, and completely misrepresented the architecture - it started talking about server backends and REST APIs in an app that doesn't have any of that.

I'm curious what went so wrong - feels like possibly an issue with loading in the right context and attending to it correctly? That seems like an area that Claude Code has really optimized for.

I have high hopes for o3 and o4-mini as models so I hope that other tests show better results! Also curious to see how Cursor etc. incorporate o3.

replies(7): >>43710162 #>>43710290 #>>43711286 #>>43713258 #>>43714390 #>>43714966 #>>43716635 #

strangescript ◴[16 Apr 25 23:08 UTC] No.43711286[source]▶

>>43710093 #

Claude Code still feels superior. o4-mini has all sorts of issues. o3 is better but at that point, you aren't saving money so who cares.

I feel like people are sleeping on Claude Code for one reason or another. Its not cheap, but its by far the best, most consistent experience I have had.

replies(3): >>43711411 #>>43711764 #>>43712470 #

artdigital ◴[17 Apr 25 02:19 UTC] No.43712470[source]▶

>>43711286 #

Claude Code is just way too expensive.

These days I’m using Amazon Q Pro on the CLI. Very similar experience to Claude Code minus a few batteries. But it’s capped at $20/mo and won’t set my credit card on fire.

replies(2): >>43713490 #>>43714291 #

1. aitchnyu ◴[17 Apr 25 05:46 UTC] No.43713490[source]▶

>>43712470 #

Is it using one of these models? https://openrouter.ai/models?q=amazon

Seems 4x costlier than my Aider+Openrouter. Since I'm less about vibes or huge refactoring, my (first and only) bill is <5 usd with Gemini. These models will halve that.

replies(1): >>43715394 #

2. artdigital ◴[17 Apr 25 11:46 UTC] No.43715394[source]▶

>>43713490 (TP) #

No, Amazon Q is using Amazon Q. You can't change the model, it's calling itself "Q" and it's capped to $20 (Q Developer Pro plan). There is also a free tier available - https://aws.amazon.com/q/developer/

It's very much a "Claude Code" in the sense that you have a "q chat" command line command that can do everything from changing files, running shell commands, reading and researching, etc. So I can say "q chat" and then tell it "read this repo and create a README" or whatever else Claude Code can do. It does everything by itself in an agentic way. (I didn't want to say like 'Aider' because the entire appeal of Claude Code is that it does everything itself, like figuring out what files to read/change)

(It's calling itself Q but from my testing it's pretty clear that it's a variant of Claude hosted through AWS which makes sense considering how much money Amazon pumped into Anthropic)

replies(2): >>43715475 #>>43718526 #

3. aitchnyu ◴[17 Apr 25 11:55 UTC] No.43715475[source]▶

>>43715394 #

I felt Sonnet 3.7 would cost at least $30 a month for light use. Did they figure out a way to offer it cheaper?

replies(1): >>43715820 #

4. nmcfarl ◴[17 Apr 25 12:29 UTC] No.43715820{3}[source]▶

>>43715475 #

I don’t know what Amazon did - but I use Aider+Openrouter with Gemini 2.5 pro and it cost 1/6 of what sonnet 3.7 does. The aider leaderboard https://aider.chat/docs/leaderboards/ - includes relative pricing theses days.

5. dingnuts ◴[17 Apr 25 15:44 UTC] No.43718526[source]▶

>>43715394 #

> the entire appeal of Claude Code is that it does everything itself, like figuring out what files to read/change

how is this appealing? I think I must be getting old because the idea of letting a language model run wild and run commands on my system -- that's unsanitized input! --horrifies me! What do you mean just let it change random files??

I'm going to have to learn a new trade, IDK

replies(2): >>43720584 #>>43725574 #

6. winrid ◴[17 Apr 25 18:38 UTC] No.43720584{3}[source]▶

>>43718526 #

It shows you the diff and you confirm it, asks you before running commands, and doesn't allow accessing files outside the current dir. You can also tell it to not ask again and let it go wild, I've built full features this way and then just go through and clean it up a bit after.

7. hmottestad ◴[18 Apr 25 06:32 UTC] No.43725574{3}[source]▶

>>43718526 #

In the OpenAI demo of codex they said that it’s sandboxed.

It only has access to files within the directory it’s run from, even if it calls tools that could theoretically access files anywhere on your system. Also had networking blocked, also in a sandboxes fashion so that things like curl don’t work either.

I wasn’t particularly impressed with my short test of Codex yesterday. Just the fact that it managed to make any decent changes at all was good, but when it messed up the code it took a long time and a lot of tokens to figure out.

I think we need fine tuned models that are good at different tasks. A specific fine tune for fixing syntax errors in Java would be a good start.

In general it also needs to be more proactive in writing and running tests.

↑