Andrej Karpathy: Software in the era of AI [video]

(www.youtube.com)

Show context

abdullin ◴[19 Jun 25 07:03 UTC] No.44316210[source]▶

Tight feedback loops are the key in working productively with software. I see that in codebases up to 700k lines of code (legacy 30yo 4GL ERP systems).

The best part is that AI-driven systems are fine with running even more tight loops than what a sane human would tolerate.

Eg. running full linting, testing and E2E/simulation suite after any minor change. Or generating 4 versions of PR for the same task so that the human could just pick the best one.

replies(7): >>44316306 #>>44316946 #>>44317531 #>>44317792 #>>44318080 #>>44318246 #>>44318794 #

latexr ◴[19 Jun 25 11:54 UTC] No.44317792[source]▶

>>44316210 #

> Or generating 4 versions of PR for the same task so that the human could just pick the best one.

That sounds awful. A truly terrible and demotivating way to work and produce anything of real quality. Why are we doing this to ourselves and embracing it?

A few years ago, it would have been seen as a joke to say “the future of software development will be to have a million monkey interns banging on one million keyboards and submit a million PRs, then choose one”. Today, it’s lauded as a brilliant business and cost-saving idea.

We’re beyond doomed. The first major catastrophe caused by sloppy AI code can’t come soon enough. The sooner it happens, the better chance we have to self-correct.

replies(6): >>44317876 #>>44317884 #>>44317997 #>>44318175 #>>44318235 #>>44318625 #

1. bonoboTP ◴[19 Jun 25 12:08 UTC] No.44317876[source]▶

>>44317792 #

If it's monkeylike quality and you need a million tries, it's shit. It you need four tries and one of those is top-tier professional programmer quality, then it's good.

replies(4): >>44317938 #>>44317975 #>>44318876 #>>44319399 #

2. agos ◴[19 Jun 25 12:18 UTC] No.44317938[source]▶

>>44317876 (TP) #

if the thing producing the four PRs can't distinguish the top tier one, I have strong doubts that it can even produce it

replies(1): >>44319323 #

3. ◴[19 Jun 25 12:23 UTC] No.44317975[source]▶

>>44317876 (TP) #

4. layer8 ◴[19 Jun 25 14:13 UTC] No.44318876[source]▶

>>44317876 (TP) #

The problem is, for any change, you have to understand the existing code base to assess the quality of the change in the four tries. This means, you aren’t relieved from being familiar with the code and reviewing everything. For many developers this review-only work style isn’t an exciting prospect.

And it will remain that way until you can delegate development tasks to AI with a 99+% success rate so that you don’t have to review their output and understand the code base anymore. At which point developers will become truly obsolete.

5. solaire_oa ◴[19 Jun 25 15:03 UTC] No.44319323[source]▶

>>44317938 #

Making 4 PRs for a well-known solution sounds insane, yes, but to be the devil's advocate, you could plausibly be working with an ambiguous task: "Create 4 PRs with 4 different dependency libraries, so that I can compare their implementations." Technically it wouldn't need to pick the best one.

I have apprehension about the future of software engineering, but comparison does technically seem like a valid use case.

6. solaire_oa ◴[19 Jun 25 15:11 UTC] No.44319399[source]▶

>>44317876 (TP) #

Top-tier professional programmer quality is exceedingly, impractically optimistic, for a few reasons.

1. There's a low probability of that in the first place.

2. You need to be a top-tier professional programmer to recognize that type of quality (i.e. a junior engineer could select one of the 3 shit PRs)

3. When it doesn't produce TTPPQ, you wasted tons of time prompting and reviewing shit code and still need to deliver, net negative.

I'm not doubting the utility of LLMs but the scattershot approach just feels like gambling to me.

replies(1): >>44320025 #

7. zelphirkalt ◴[19 Jun 25 16:15 UTC] No.44320025[source]▶

>>44319399 #

Also as a consequence of (1) the LLMs are trained on mediocre code mostly, so they often output mediocre or bad solutions.

↑