OpenAI Codex CLI: Lightweight coding agent that runs in your terminal

(github.com)

516 points mfiguiere | 1 comments | 16 Apr 25 17:24 UTC | HN request time: 0.207s | source

Show context

gklitt ◴[16 Apr 25 20:37 UTC] No.43710093[source]▶

I tried one task head-to-head with Codex o4-mini vs Claude Code: writing documentation for a tricky area of a medium-sized codebase.

Claude Code did great and wrote pretty decent docs.

Codex didn't do well. It hallucinated a bunch of stuff that wasn't in the code, and completely misrepresented the architecture - it started talking about server backends and REST APIs in an app that doesn't have any of that.

I'm curious what went so wrong - feels like possibly an issue with loading in the right context and attending to it correctly? That seems like an area that Claude Code has really optimized for.

I have high hopes for o3 and o4-mini as models so I hope that other tests show better results! Also curious to see how Cursor etc. incorporate o3.

replies(7): >>43710162 #>>43710290 #>>43711286 #>>43713258 #>>43714390 #>>43714966 #>>43716635 #

enether ◴[17 Apr 25 10:30 UTC] No.43714966[source]▶

>>43710093 #

there was one post that detailed how those OpenAI models hallucinate and double down on thier mistakes by "lying" - it speculated on a bunch of interesting reasons why this may be the case

I wonder if this is what's causing it to do badly in these cases

replies(1): >>43754011 #

1. victor9000 ◴[21 Apr 25 16:56 UTC] No.43754011[source]▶

>>43714966 #

> I no longer have the “real” prime I generated during that earlier session... I produced it in a throw‑away Python process, verified it, copied it to the clipboard, and then closed the interpreter.

AGI may well be on its way, as the mode is mastering the fine art of bullshitting.

↑