Andrej Karpathy: Software in the era of AI [video]

(www.youtube.com)

1480 points sandslash | 5 comments | 19 Jun 25 00:33 UTC | HN request time: 0.639s | source

Show context

abdullin ◴[19 Jun 25 07:03 UTC] No.44316210[source]▶

Tight feedback loops are the key in working productively with software. I see that in codebases up to 700k lines of code (legacy 30yo 4GL ERP systems).

The best part is that AI-driven systems are fine with running even more tight loops than what a sane human would tolerate.

Eg. running full linting, testing and E2E/simulation suite after any minor change. Or generating 4 versions of PR for the same task so that the human could just pick the best one.

replies(7): >>44316306 #>>44316946 #>>44317531 #>>44317792 #>>44318080 #>>44318246 #>>44318794 #

bandoti ◴[19 Jun 25 12:36 UTC] No.44318080[source]▶

>>44316210 #

Here’s a few problems I foresee:

1. People get lazy when presented with four choices they had no hand in creating, and they don’t look over the four and just click one, ignoring the others. Why? Because they have ten more of these on the go at once, diminishing their overall focus.

2. Automated tests, end-to-end sim., linting, etc—tools already exist and work at scale. They should be robust and THOROUGHLY reviewed by both AI and humans ideally.

3. AI is good for code reviews and “another set of eyes” but man it makes serious mistakes sometimes.

An anecdote for (1), when ChatGPT tries to A/B test me with two answers, it’s incredibly burdensome for me to read twice virtually the same thing with minimal differences.

Code reviewing four things that do almost the same thing is more of a burden than writing the same thing once myself.

replies(2): >>44318111 #>>44318430 #

1. abdullin ◴[19 Jun 25 12:40 UTC] No.44318111[source]▶

>>44318080 #

A simple rule applies: "No matter what tool created the code, you are still responsible for what you merge into main".

As such, task of verification, still falls on hands of engineers.

Given that and proper processes, modern tooling works nicely with codebases ranging from 10k LOC (mixed embedded device code with golang backends and python DS/ML) to 700k LOC (legacy enterprise applications from the mainframe era)

replies(3): >>44318177 #>>44318268 #>>44319968 #

2. bandoti ◴[19 Jun 25 12:48 UTC] No.44318177[source]▶

>>44318111 (TP) #

Agreed. I think engineers though following simple Test-Driven Development procedures can write the code, unit tests, integration tests, debug, etc for a small enough unit by default forces tight feedback loops. AI may assist in the particulars, not run the show.

I’m willing to bet, short of droid-speak or some AI output we can’t even understand, that when considering “the system as a whole”, that even with short-term gains in speed, the longevity of any product will be better with real people following current best-practices, and perhaps a modest sprinkle of AI.

Why? Because AI is trained on the results of human endeavors and can only work within that framework.

replies(1): >>44318282 #

3. ponector ◴[19 Jun 25 12:57 UTC] No.44318268[source]▶

>>44318111 (TP) #

> As such, task of verification, still falls on hands of engineers.

Even before LLM it was a common thing to merge changes which completely brake test environment. Some people really skip verification phase of their work.

4. abdullin ◴[19 Jun 25 12:58 UTC] No.44318282[source]▶

>>44318177 #

Agreed. AI is just a tool. Letting in run the show is essentially what the vibe-coding is. It is a fun activity for prototyping, but tends to accumulate problems and tech debt at an astonishing pace.

Code, manually crafted by professionals, will almost always beat AI-driven code in quality. Yet, one has still to find such professionals and wait for them to get the job done.

I think, the right balance is somewhere in between - let tools handle the mundane parts (e.g. mechanically rewriting that legacy Progress ABL/4GL code to Kotlin), while human engineers will have fun with high-level tasks and shaping the direction of the project.

5. xpe ◴[19 Jun 25 16:08 UTC] No.44319968[source]▶

>>44318111 (TP) #

> A simple rule applies: "No matter what tool created the code, you are still responsible for what you merge into main".

Beware of claims of simple rules.

Take one subset of the problem: code reviews in an organizational environment. How well does they simple rule above work?

The idea of “Person P will take responsibility” is far from clear and often not a good solution. (1) P is fallible. (2) Some consequences are too great to allow one person to trigger them, which is why we have systems and checks. (3) P cannot necessarily right the wrong. (4) No-fault analyses are often better when it comes to long-term solutions which require a fear free culture to reduce cover-ups.

But this is bigger than one organization. The effects of software quickly escape organizational boundaries. So when we think about giving more power to AI tooling, we have to be really smart. This means understanding human nature, decision theory, political economy [1], societal norms, and law. And building smart systems (technical and organizational)

Recommending good strategies for making AI generated code safe is hard problem. I’d bet it is a much harder than even “elite” software developers people have contemplated, much less implemented. Training in software helps but is insufficient. I personally have some optimism for formal methods, defense in depth, and carefully implemented human-in-the-loop systems.

[1] Political economy uses many of the tools of economics to study the incentives of human decision making

↑