Andrej Karpathy: Software in the era of AI [video]

(www.youtube.com)

1479 points sandslash | 5 comments | 19 Jun 25 00:33 UTC | HN request time: 1.025s | source

Show context

abdullin ◴[19 Jun 25 07:03 UTC] No.44316210[source]▶

Tight feedback loops are the key in working productively with software. I see that in codebases up to 700k lines of code (legacy 30yo 4GL ERP systems).

The best part is that AI-driven systems are fine with running even more tight loops than what a sane human would tolerate.

Eg. running full linting, testing and E2E/simulation suite after any minor change. Or generating 4 versions of PR for the same task so that the human could just pick the best one.

replies(7): >>44316306 #>>44316946 #>>44317531 #>>44317792 #>>44318080 #>>44318246 #>>44318794 #

OvbiousError ◴[19 Jun 25 09:35 UTC] No.44316946[source]▶

>>44316210 #

I don't think the human is the problem here, but the time it takes to run the full testing suite.

replies(6): >>44317032 #>>44317123 #>>44317166 #>>44317246 #>>44317515 #>>44318555 #

diggan ◴[19 Jun 25 10:10 UTC] No.44317123[source]▶

>>44316946 #

It is kind of a human problem too, although that the full testing suite takes X hours to run is also not fun, but it makes the human problem larger.

Say you're Human A, working on a feature. Running the full testing suite takes 2 hours from start to finish. Every change you do to existing code needs to be confirmed to not break existing stuff with the full testing suite, so some changes it takes 2 hours before you have 100% understanding that it doesn't break other things. How quickly do you lose interest, and at what point do you give up to either improve the testing suite, or just skip that feature/implement it some other way?

Now say you're Robot A working on the same task. The robot doesn't care if each change takes 2 hours to appear on their screen, the context is exactly the same, and they're still "a helpful assistant" 48 hours later when they still try to get the feature put together without breaking anything.

If you're feeling brave, you start Robot B and C at the same time.

replies(2): >>44317507 #>>44317902 #

1. abdullin ◴[19 Jun 25 11:11 UTC] No.44317507[source]▶

>>44317123 #

This is the workflow that ChatGPT Codex demonstrates nicely. Launch any number of «robotic» tasks in parallel, then go on your own. Come back later to review the results and pick good ones.

replies(1): >>44317620 #

2. diggan ◴[19 Jun 25 11:30 UTC] No.44317620[source]▶

>>44317507 (TP) #

Well, they're demonstrating it somewhat, it's more of a prototype today. First tell is the low limit, I think the longest task for me been 15 minutes before it gives up. Second tell is still using a chat UI which is simple to implement, easy to implement and familiar, but also kind of lazy. There should be a better UX, especially with the new variations they just added. From the top of my head, some graph-like UX might have been better.

replies(1): >>44318193 #

3. abdullin ◴[19 Jun 25 12:50 UTC] No.44318193[source]▶

>>44317620 #

I guess, it depends on the case and the approach.

It works really nice with the following approach (distilled from experiences reported by multiple companies)

(1) Augment codebase with explanatory texts that describe individual modules, interfaces and interactions (something that is needed for the humans anyway)

(2) Provide Agent.MD that describes the approach/style/process that the AI agent must take. It should also describe how to run all tests.

(3) Break down the task into smaller features. For each feature - ask first to write a detailed implementation plan (because it is easier to review the plan than 1000 lines of changes. spread across a dozen files)

(4) Review the plan and ask to improve it, if needed. When ready - ask to draft an actual pull request

(5) The system will automatically use all available tests/linting/rules before writing the final PR. Verify and provide feedback, if some polish is needed.

(6) Launch multiple instances of "write me an implementation plan" and "Implement this plan" task, to pick the one that looks the best.

This is very similar to git-driven development of large codebases by distributed teams.

Edit: added newlines

replies(1): >>44319532 #

4. diggan ◴[19 Jun 25 15:22 UTC] No.44319532{3}[source]▶

>>44318193 #

> distilled from experiences reported by multiple companies

Distilled from my experience, I'd still say that the UX is lacking, as sequential chat just isn't the right format. I agree with Karpathy that we haven't found the right way of interacting with these OSes yet.

Even with what you say, variations were implemented in a rush. Once you've iterated with one variation you can not at the same time iterate on another variant, for example.

replies(1): >>44336655 #

5. abdullin ◴[21 Jun 25 11:27 UTC] No.44336655{4}[source]▶

>>44319532 #

Yes. I believe, the experience will get better. Plus more AI vendors will catch up with OpenAI and offer similar experiences in their products.

It will just take a few months.

↑