A Research Preview of Codex

1. nadis ◴[16 May 25 17:52 UTC] No.44008123[source]▶

In the preview video, I appreciated Katy Shi's comment on "I think this is a reflection of where engineering work has moved over the past where a lot of my time now is spent reviewing code rather than writing it."

Preview video from Open AI: https://www.youtube.com/watch?v=hhdpnbfH6NU&t=878s

As I think about what "AI-native" or just the future of building software loos like, its interesting to me that - right now - developers are still just reading code and tests rather than looking at simulations.

While a new(ish) concept for software development, simulations could provide a wider range of outcomes and, especially for the front end, are far easier to evaluate than just code/tests alone. I'm biased because this is something I've been exploring but it really hit me over the head looking at the Codex launch materials.

replies(5): >>44008199 #>>44010123 #>>44012135 #>>44012584 #>>44012926 #

2. ai-christianson ◴[16 May 25 17:59 UTC] No.44008199[source]▶

>>44008123 (TP) #

> rather than looking at simulations

You mean like automated test suites?

replies(1): >>44008290 #

3. tough ◴[16 May 25 18:09 UTC] No.44008290[source]▶

>>44008199 #

automated visual fuzzy-testing with some self-reinforcement loops

There's already library's for QA testing and VLM's can give critique on a series of screenshots automated by a playwright script per branch

replies(1): >>44008539 #

4. ai-christianson ◴[16 May 25 18:35 UTC] No.44008539{3}[source]▶

>>44008290 #

Cool. Putting vision in the loop is a great idea.

Ambitious idea, but I like it.

replies(3): >>44008641 #>>44009970 #>>44036219 #

5. tough ◴[16 May 25 18:44 UTC] No.44008641{4}[source]▶

>>44008539 #

SmolVLM, Gemma, LlaVa, in case you wanna play with some of the ones i've tried.

https://huggingface.co/blog/smolvlm

recently both llama.cpp and ollama got better support for them too, which makes this kind of integration with local/self-hosted models now more attainable/less expensive

replies(1): >>44008693 #

6. tough ◴[16 May 25 18:51 UTC] No.44008693{5}[source]▶

>>44008641 #

also this for the visual regression testing parts, but you can add some AI onto the mix ;) https://github.com/lost-pixel/lost-pixel

7. ericghildyal ◴[16 May 25 21:34 UTC] No.44009970{4}[source]▶

>>44008539 #

I used Cline to build a tiny testing helper app and this is exactly what it did!

It made changes in TS/Next.js given just the boiletplate from create-next-app, ran `yarn dev` then opened its mini LLM browser and navigated to localhost to verify everything looked correct.

It found 1 mistake and fixed the issue then ran `yarn dev` again, opened a new browser, navigated to localhost (pointing at the original server it brought up, not the new one at another port) and confirmed the change was correct.

I was very impressed but still laughed at how it somehow backed its way into a flow the worked, but only because Next has hot-reloading.

8. fosterfriends ◴[16 May 25 21:56 UTC] No.44010123[source]▶

>>44008123 (TP) #

++ Kind of my whole thesis with Graphite. As more code gets AI-generated, the weight shifts to review, testing, and integration. Even as someone helping build AI code reviewers, we'll _need_ humans stamping forever - for many reasons, but fundamentally for accountability. A computer can never be held accountable

https://constelisvoss.com/pages/a-computer-can-never-be-held...

replies(2): >>44010360 #>>44036211 #

9. hintymad ◴[16 May 25 22:32 UTC] No.44010360[source]▶

>>44010123 #

> A computer can never be held accountable

I think the issue is not about humans being entirely replaced. Instead, the issue is that if AI replaces enough number of knowledge workers while there's no new or expanded market to absorb the workforce, the new balance of supply and demand will mean that many of us will have suppressed pay or worse, losing our jobs forever.

replies(1): >>44014269 #

10. sagarpatil ◴[17 May 25 04:58 UTC] No.44012135[source]▶

>>44008123 (TP) #

Re:simulation Deebo does this for debugging: https://github.com/snagasuri/deebo-prototype

replies(1): >>44036216 #

11. klabb3 ◴[17 May 25 07:11 UTC] No.44012584[source]▶

>>44008123 (TP) #

> a lot of my time now is spent reviewing code rather than writing it.

Reviewing has never been a panacea. It’s a best-effort at catching obvious mistakes, like a second opinion. Only with highly rigorous tests can reviewing give as high confidence as I trust another engineer or myself. Generally cadence of code output has never been a bottleneck for me, rather the opposite (if I had more time I’d write you a shorter letter).

Most importantly, writing code that is testable on meaningful boundaries is an extremely difficult and delicate art form, which ime is something you really want to get right if possible. Not saying an AI can or can’t do that, only that it’s the hardest part. An army of automated junior engineers still can’t win over the complexity beast that yolo programming causes. At some point code mutations will cause more problems as side effects than what they fix.

replies(1): >>44036203 #

12. csomar ◴[17 May 25 08:35 UTC] No.44012926[source]▶

>>44008123 (TP) #

> I think this is a reflection of where engineering work has moved over the past where a lot of my time now is spent reviewing code rather than writing it.

This was always true. Front-End code is not really code. Most of the back-end code is just convert and moving data around. For most functionality where you need "real code" like crypto, compression, math, etc.. you use a library used by another 100k developers.

13. TeMPOraL ◴[17 May 25 13:40 UTC] No.44014269{3}[source]▶

>>44010360 #

That is true regardless of whether there is or isn't a "new or expanded market to absorb the workforce".

It's a crucial insight that's usually missed or eluded in discussions about automation and workforce - unless you're literally at the beginning of your career, losing your career to automation screws you over big time, forever. At best, you'll have to downsize your entire lifestyle, and that of your family, to be commensurate with your now entry-level pay. If you're halfway through the career that suddenly ended, you won't recover.

All the new jobs and markets are for the kids. Mind you, not your kids - your kids are going to be disadvantaged by their household being suddenly thrown into financial insecurity or downright poverty, and may not even get a chance to start a good career path with their peers.

That, not "anti technology sentiment", is why Luddites smashed the looms. Those were people who got rug-pulled by business decisions and thrown into poverty, along with their families and communities.

14. nadis ◴[19 May 25 23:43 UTC] No.44036203[source]▶

>>44012584 #

> An army of automated junior engineers still can’t win over the complexity beast that yolo programming causes. At some point code mutations will cause more problems as side effects than what they fix.

This resonates a lot for me, completely agreed.

15. nadis ◴[19 May 25 23:44 UTC] No.44036211[source]▶

>>44010123 #

> A computer can never be held accountable

I feel like I've been thinking along similar lines recently (due to re-read this though!) but instead of "computer" am replacing it with "AI" or "Agents" these days. Same point holds true.

16. nadis ◴[19 May 25 23:45 UTC] No.44036216[source]▶

>>44012135 #

Thanks for sharing - wasn't familiar with Deebo!

17. nadis ◴[19 May 25 23:45 UTC] No.44036219{4}[source]▶

>>44008539 #

Yes, the above reply is more what I meant! Vision / visualization not just more automated testing.

Definitely ambitious!