←back to thread

516 points mfiguiere | 1 comments | | HN request time: 0.203s | source
Show context
gklitt ◴[] No.43710093[source]
I tried one task head-to-head with Codex o4-mini vs Claude Code: writing documentation for a tricky area of a medium-sized codebase.

Claude Code did great and wrote pretty decent docs.

Codex didn't do well. It hallucinated a bunch of stuff that wasn't in the code, and completely misrepresented the architecture - it started talking about server backends and REST APIs in an app that doesn't have any of that.

I'm curious what went so wrong - feels like possibly an issue with loading in the right context and attending to it correctly? That seems like an area that Claude Code has really optimized for.

I have high hopes for o3 and o4-mini as models so I hope that other tests show better results! Also curious to see how Cursor etc. incorporate o3.

replies(7): >>43710162 #>>43710290 #>>43711286 #>>43713258 #>>43714390 #>>43714966 #>>43716635 #
1. ksec ◴[] No.43716635[source]
Sometimes I see in certain areas AI / LLM is absolutely crushing those jobs, a whole category will be gone in next 5 to 10 years as they are already 80 - 90% mark. They just need another 5 - 10% as they continue to get improvement and they are already cheaper per task.

Sometimes I see an area of AI/LLM that I thought even with 10x efficiency improvement and 10x hardware resources which is 100x in aggregate it will still be no where near good enough.

The truth is probably somewhere in the middle. Which is why I dont believe AGI will be here any time soon. But Assisted Intelligence is no doubt in its iPhone moment and continue for another 10 years before hopefully another breakthrough.