OpenAI Codex CLI: Lightweight coding agent that runs in your terminal

(github.com)

516 points mfiguiere | 1 comments | 16 Apr 25 17:24 UTC | HN request time: 0.225s | source

Show context

gklitt ◴[16 Apr 25 20:37 UTC] No.43710093[source]▶

I tried one task head-to-head with Codex o4-mini vs Claude Code: writing documentation for a tricky area of a medium-sized codebase.

Claude Code did great and wrote pretty decent docs.

Codex didn't do well. It hallucinated a bunch of stuff that wasn't in the code, and completely misrepresented the architecture - it started talking about server backends and REST APIs in an app that doesn't have any of that.

I'm curious what went so wrong - feels like possibly an issue with loading in the right context and attending to it correctly? That seems like an area that Claude Code has really optimized for.

I have high hopes for o3 and o4-mini as models so I hope that other tests show better results! Also curious to see how Cursor etc. incorporate o3.

replies(7): >>43710162 #>>43710290 #>>43711286 #>>43713258 #>>43714390 #>>43714966 #>>43716635 #

strangescript ◴[16 Apr 25 23:08 UTC] No.43711286[source]▶

>>43710093 #

Claude Code still feels superior. o4-mini has all sorts of issues. o3 is better but at that point, you aren't saving money so who cares.

I feel like people are sleeping on Claude Code for one reason or another. Its not cheap, but its by far the best, most consistent experience I have had.

replies(3): >>43711411 #>>43711764 #>>43712470 #

Aeolun ◴[16 Apr 25 23:25 UTC] No.43711411[source]▶

>>43711286 #

> Its not cheap, but its by far the best, most consistent experience I have had.

It’s too expensive for what it does though. And it starts failing rapidly when it exhausts the context window.

replies(2): >>43711801 #>>43712547 #

1. jasonjmcghee ◴[17 Apr 25 00:27 UTC] No.43711801[source]▶

>>43711411 #

If you get a hang of controlling costs, it's much cheaper. If you're exhausting the context window, I'm not surprised you're seeing high cost.

Be aware of the "cache".

Tell it to read specific files, never use /compact (that'll bust cache, if you need to, you're going back and forth too much or using too many files at once).

Never edit files manually during a session (that'll bust cache). THIS INCLUDES LINT.

Have a clear goal in mind and keep sessions to as few messages as possible.

Write / generate markdown files with needed documentation using claude.ai, and save those as files in the repo and tell it to read that file as part of a question.

I'm at about ~$0.5-0.75 for most "tasks" I give it. I'm not a super heavy user, but it definitely helps me (it's like having a super focused smart intern that makes dumb mistakes).

If i need to feed it a ton of docs etc. for some task, it'll be more in the few $, rather than < $1. But I really only do this to try some prototype with a library claude doesn't know about (or is outdated).

For hobby stuff, it adds up - totally.

For a company, massively worth it. Insanely cheap productivity boost (if developers are responsible / don't get lazy / don't misuse it).

↑