A Research Preview of Codex

(openai.com)

511 points meetpateltech | 5 comments | 16 May 25 15:02 UTC | HN request time: 0.004s | source

Show context

johnjwang ◴[16 May 25 16:27 UTC] No.44007301[source]▶

Some engineers on my team at Assembled and I have been a part of the alpha test of Codex, and I'll say it's been quite impressive.

We’ve long used local agents like Cursor and Claude Code, so we didn’t expect too much. But Codex shines in a few areas:

Parallel task execution: You can batch dozens of small edits (refactors, tests, boilerplate) and run them concurrently without context juggling. It's super nice to run a bunch of tasks at the same time (something that's really hard to do in Cursor, Cline, etc.)

It kind of feels like a junior engineer on steroids, you just need to point it at a file or function, specify the change, and it scaffolds out most of a PR. You still need to do a lot of work to get it production ready, but it's as if you have an infinite number of junior engineers at your disposal now all working on different things.

Model quality is good, but hard to say it's that much better than other models. In side-by-side tests with Cursor + Gemini 2.5-pro, naming, style and logic are relatively indistinguishable, so quality meets our bar but doesn’t yet exceed it.

replies(15): >>44007420 #>>44007425 #>>44007552 #>>44007565 #>>44007575 #>>44007870 #>>44008106 #>>44008575 #>>44008809 #>>44009066 #>>44009783 #>>44010245 #>>44012131 #>>44014948 #>>44016788 #

1. strangescript ◴[16 May 25 16:39 UTC] No.44007425[source]▶

>>44007301 #

it feels like openai are at a ceiling with their models, codex1 seems to be another RLHF derivative from the same base model. You can see this in their own self reported o3-high comparison where at 8 tries they converge at the same accuracy.

It also seems very telling they have not mentioned o4-high benchmarks at all. o4-mini exists, so logically there is an o4 full model right?

replies(1): >>44008188 #

2. aorobin ◴[16 May 25 17:58 UTC] No.44008188[source]▶

>>44007425 (TP) #

Seems likely that they are waiting to release o4 full results until the gpt-5 release later this year, presumably because gpt-5 is bundled with a roughly o4 level reasoning capability, and they want gpt-5 to feel like a significant release.

replies(1): >>44008924 #

3. losvedir ◴[16 May 25 19:18 UTC] No.44008924[source]▶

>>44008188 #

Do you still think there will be a gpt-5? I thought the consensus was GPT-5 never really panned out and was released with little fanfare as 4.1.

replies(2): >>44014940 #>>44014958 #

4. aorobin ◴[17 May 25 15:17 UTC] No.44014940{3}[source]▶

>>44008924 #

Yeah, just last month Altman said gpt-5 is coming in a few months, and betting/prediction sites are expecting it this year, probably in the summer.

5. brookst ◴[17 May 25 15:19 UTC] No.44014958{3}[source]▶

>>44008924 #

Marketing names aren’t really connected to product generations. We might target v3 of a product for a date and then decide it’s really 2.4, doesn’t mean we won’t market something as v3 later.

↑