A Research Preview of Codex

(openai.com)

Show context

johnjwang ◴[16 May 25 16:27 UTC] No.44007301[source]▶

Some engineers on my team at Assembled and I have been a part of the alpha test of Codex, and I'll say it's been quite impressive.

We’ve long used local agents like Cursor and Claude Code, so we didn’t expect too much. But Codex shines in a few areas:

Parallel task execution: You can batch dozens of small edits (refactors, tests, boilerplate) and run them concurrently without context juggling. It's super nice to run a bunch of tasks at the same time (something that's really hard to do in Cursor, Cline, etc.)

It kind of feels like a junior engineer on steroids, you just need to point it at a file or function, specify the change, and it scaffolds out most of a PR. You still need to do a lot of work to get it production ready, but it's as if you have an infinite number of junior engineers at your disposal now all working on different things.

Model quality is good, but hard to say it's that much better than other models. In side-by-side tests with Cursor + Gemini 2.5-pro, naming, style and logic are relatively indistinguishable, so quality meets our bar but doesn’t yet exceed it.

replies(15): >>44007420 #>>44007425 #>>44007552 #>>44007565 #>>44007575 #>>44007870 #>>44008106 #>>44008575 #>>44008809 #>>44009066 #>>44009783 #>>44010245 #>>44012131 #>>44014948 #>>44016788 #

1. woah ◴[16 May 25 16:53 UTC] No.44007565[source]▶

>>44007301 #

> Parallel task execution: You can batch dozens of small edits (refactors, tests, boilerplate) and run them concurrently without context juggling. It's super nice to run a bunch of tasks at the same time (something that's really hard to do in Cursor, Cline, etc.)

> It kind of feels like a junior engineer on steroids, you just need to point it at a file or function, specify the change, and it scaffolds out most of a PR. You still need to do a lot of work to get it production ready, but it's as if you have an infinite number of junior engineers at your disposal now all working on different things.

What's the benefit of this? It sounds like it's just a gimmick for the "AI will replace programmers" headlines. In reality, LLMs complete their tasks within seconds, and the time consuming part is specifying the tasks and then reviewing and correcting them. What is the point of parallelizing the fastest part of the process?

replies(3): >>44007748 #>>44008121 #>>44008143 #

2. ctoth ◴[16 May 25 17:10 UTC] No.44007748[source]▶

>>44007565 (TP) #

> Each task is processed independently in a separate, isolated environment preloaded with your codebase. Codex can read and edit files, as well as run commands including test harnesses, linters, and type checkers. Task completion typically takes between 1 and 30 minutes, depending on complexity, and you can monitor Codex’s progress in real time.

3. johnjwang ◴[16 May 25 17:52 UTC] No.44008121[source]▶

>>44007565 (TP) #

In my experience, it still does take quite a bit of time (minutes) to run a task on these agentic LLMs (especially with the latest reasoning models), and in Cursor / Cline / other code editor versions of AI, it's enough time for you to get distracted, lose context, and start working on another task.

So the benefit is really that during this "down" time, you can do multiple useful things in parallel. Previously, our engineers were waiting on the Cursor agent to finish, but the parallelization means you're explicitly turning your brain off of one task and moving on to a different task.

replies(1): >>44009434 #

4. kfajdsl ◴[16 May 25 17:54 UTC] No.44008143[source]▶

>>44007565 (TP) #

A single response can take a few seconds, but tasks with agentic flows can be dozens of back and forths. I've had a fairly complicated Roo Code task take 10 minutes (multiple subtasks).

replies(1): >>44030301 #

5. woah ◴[16 May 25 20:16 UTC] No.44009434[source]▶

>>44008121 #

In my experience in Cursor with Claude 3.5 and Gemini 2.5, if an agent has run for more than a minute it has usually lost the plot. Maybe model use in Codex is a new breed?

replies(2): >>44009884 #>>44011235 #

6. odie5533 ◴[16 May 25 21:18 UTC] No.44009884{3}[source]▶

>>44009434 #

It depends what level you ask them to work on, but I agree, all of my agent coding is active and completed in usually <15 seconds.

7. scragz ◴[17 May 25 01:07 UTC] No.44011235{3}[source]▶

>>44009434 #

with cline you can give it a huge action plan and it will grind away until it's done. with all the context shenanigans that cursor and copilot do, it can't handle multiple tasks as well. then they are farming requests from the user so they make you click to continue all the time.

replies(1): >>44024470 #

8. woah ◴[18 May 25 21:35 UTC] No.44024470{4}[source]▶

>>44011235 #

Just tried Cline and it lost the plot and ground away $5

replies(1): >>44030277 #

9. tom_m ◴[19 May 25 14:28 UTC] No.44030277{5}[source]▶

>>44024470 #

This is the thing people need to be most aware of with all of these tools. I like how Roo Code shows you the costs.

It's going to take a while before companies catch up here and realize the "hidden costs" of AI. In a few years I think the narrative will shift to how efficient one is at using AI.

I can only imagine someone somewhere wanted to know "how much does it cost for a line of code?" And for decades no one could answer this. Frustrated with this, the neanderthal business man now turns to AI and is in ignorant bliss with the new found fire about to burn down his house.

10. tom_m ◴[19 May 25 14:31 UTC] No.44030301[source]▶

>>44008143 #

How much of it did you read? Haha. That's not anything against you, I'm just pointing out to people that there will be a bunch of folks out there who will never care to read and learn. They just want to mash all the buttons until it works.

When I was a kid, that worked with Nintendo games sure...but I like to think I've matured beyond that...but I haven't read every little thing returned by the LLM in Roo Code myself, so maybe it's human nature.

↑