←back to thread

514 points mfiguiere | 1 comments | | HN request time: 0s | source
Show context
gklitt ◴[] No.43710093[source]
I tried one task head-to-head with Codex o4-mini vs Claude Code: writing documentation for a tricky area of a medium-sized codebase.

Claude Code did great and wrote pretty decent docs.

Codex didn't do well. It hallucinated a bunch of stuff that wasn't in the code, and completely misrepresented the architecture - it started talking about server backends and REST APIs in an app that doesn't have any of that.

I'm curious what went so wrong - feels like possibly an issue with loading in the right context and attending to it correctly? That seems like an area that Claude Code has really optimized for.

I have high hopes for o3 and o4-mini as models so I hope that other tests show better results! Also curious to see how Cursor etc. incorporate o3.

replies(7): >>43710162 #>>43710290 #>>43711286 #>>43713258 #>>43714390 #>>43714966 #>>43716635 #
ilaksh ◴[] No.43710290[source]
Did you try the same exact test with o3 instead? The mini models are meant for speed.
replies(1): >>43710850 #
1. gklitt ◴[] No.43710850[source]
I want to but I’ve been having trouble getting o3 to work - lots of errors related to model selection.