A Research Preview of Codex

(openai.com)

511 points meetpateltech | 5 comments | 16 May 25 15:02 UTC | HN request time: 0.491s | source

Show context

nadis ◴[16 May 25 17:52 UTC] No.44008123[source]▶

In the preview video, I appreciated Katy Shi's comment on "I think this is a reflection of where engineering work has moved over the past where a lot of my time now is spent reviewing code rather than writing it."

Preview video from Open AI: https://www.youtube.com/watch?v=hhdpnbfH6NU&t=878s

As I think about what "AI-native" or just the future of building software loos like, its interesting to me that - right now - developers are still just reading code and tests rather than looking at simulations.

While a new(ish) concept for software development, simulations could provide a wider range of outcomes and, especially for the front end, are far easier to evaluate than just code/tests alone. I'm biased because this is something I've been exploring but it really hit me over the head looking at the Codex launch materials.

replies(5): >>44008199 #>>44010123 #>>44012135 #>>44012584 #>>44012926 #

ai-christianson ◴[16 May 25 17:59 UTC] No.44008199[source]▶

>>44008123 #

> rather than looking at simulations

You mean like automated test suites?

replies(1): >>44008290 #

tough ◴[16 May 25 18:09 UTC] No.44008290[source]▶

>>44008199 #

automated visual fuzzy-testing with some self-reinforcement loops

There's already library's for QA testing and VLM's can give critique on a series of screenshots automated by a playwright script per branch

replies(1): >>44008539 #

1. ai-christianson ◴[16 May 25 18:35 UTC] No.44008539[source]▶

>>44008290 #

Cool. Putting vision in the loop is a great idea.

Ambitious idea, but I like it.

replies(3): >>44008641 #>>44009970 #>>44036219 #

2. tough ◴[16 May 25 18:44 UTC] No.44008641[source]▶

>>44008539 (TP) #

SmolVLM, Gemma, LlaVa, in case you wanna play with some of the ones i've tried.

https://huggingface.co/blog/smolvlm

recently both llama.cpp and ollama got better support for them too, which makes this kind of integration with local/self-hosted models now more attainable/less expensive

replies(1): >>44008693 #

3. tough ◴[16 May 25 18:51 UTC] No.44008693[source]▶

>>44008641 #

also this for the visual regression testing parts, but you can add some AI onto the mix ;) https://github.com/lost-pixel/lost-pixel

4. ericghildyal ◴[16 May 25 21:34 UTC] No.44009970[source]▶

>>44008539 (TP) #

I used Cline to build a tiny testing helper app and this is exactly what it did!

It made changes in TS/Next.js given just the boiletplate from create-next-app, ran `yarn dev` then opened its mini LLM browser and navigated to localhost to verify everything looked correct.

It found 1 mistake and fixed the issue then ran `yarn dev` again, opened a new browser, navigated to localhost (pointing at the original server it brought up, not the new one at another port) and confirmed the change was correct.

I was very impressed but still laughed at how it somehow backed its way into a flow the worked, but only because Next has hot-reloading.

5. nadis ◴[19 May 25 23:45 UTC] No.44036219[source]▶

>>44008539 (TP) #

Yes, the above reply is more what I meant! Vision / visualization not just more automated testing.

Definitely ambitious!

↑