←back to thread

385 points meetpateltech | 4 comments | | HN request time: 0s | source
Show context
nadis ◴[] No.44008123[source]
In the preview video, I appreciated Katy Shi's comment on "I think this is a reflection of where engineering work has moved over the past where a lot of my time now is spent reviewing code rather than writing it."

Preview video from Open AI: https://www.youtube.com/watch?v=hhdpnbfH6NU&t=878s

As I think about what "AI-native" or just the future of building software loos like, its interesting to me that - right now - developers are still just reading code and tests rather than looking at simulations.

While a new(ish) concept for software development, simulations could provide a wider range of outcomes and, especially for the front end, are far easier to evaluate than just code/tests alone. I'm biased because this is something I've been exploring but it really hit me over the head looking at the Codex launch materials.

replies(2): >>44008199 #>>44010123 #
ai-christianson ◴[] No.44008199[source]
> rather than looking at simulations

You mean like automated test suites?

replies(1): >>44008290 #
tough ◴[] No.44008290[source]
automated visual fuzzy-testing with some self-reinforcement loops

There's already library's for QA testing and VLM's can give critique on a series of screenshots automated by a playwright script per branch

replies(1): >>44008539 #
1. ai-christianson ◴[] No.44008539{3}[source]
Cool. Putting vision in the loop is a great idea.

Ambitious idea, but I like it.

replies(2): >>44008641 #>>44009970 #
2. tough ◴[] No.44008641[source]
SmolVLM, Gemma, LlaVa, in case you wanna play with some of the ones i've tried.

https://huggingface.co/blog/smolvlm

recently both llama.cpp and ollama got better support for them too, which makes this kind of integration with local/self-hosted models now more attainable/less expensive

replies(1): >>44008693 #
3. tough ◴[] No.44008693[source]
also this for the visual regression testing parts, but you can add some AI onto the mix ;) https://github.com/lost-pixel/lost-pixel
4. ericghildyal ◴[] No.44009970[source]
I used Cline to build a tiny testing helper app and this is exactly what it did!

It made changes in TS/Next.js given just the boiletplate from create-next-app, ran `yarn dev` then opened its mini LLM browser and navigated to localhost to verify everything looked correct.

It found 1 mistake and fixed the issue then ran `yarn dev` again, opened a new browser, navigated to localhost (pointing at the original server it brought up, not the new one at another port) and confirmed the change was correct.

I was very impressed but still laughed at how it somehow backed its way into a flow the worked, but only because Next has hot-reloading.