Andrej Karpathy: Software in the era of AI [video]

(www.youtube.com)

1479 points sandslash | 1 comments | 19 Jun 25 00:33 UTC | HN request time: 0.242s | source

Show context

tudorizer ◴[19 Jun 25 15:16 UTC] No.44319472[source]▶

95% terrible expression of the landscape, 5% neatly dumbed down analogies.

English is a terrible language for deterministic outcomes in complex/complicated systems. Vibe coders won't understand this until they are 2 years into building the thing.

LLMs have their merits and he sometimes aludes to them, although it almost feels accidental.

Also, you don't spend years studying computer science to learn the language/syntax, but rather the concepts and systems, which don't magically disappear with vibe coding.

This whole direction is a cheeky Trojan horse. A dramatic problem, hidden in a flashy solution, to which a fix will be upsold 3 years from now.

I'm excited to come back to this comment in 3 years.

replies(10): >>44319579 #>>44319777 #>>44320017 #>>44320108 #>>44320322 #>>44320523 #>>44320547 #>>44320613 #>>44320728 #>>44320743 #

diggan ◴[19 Jun 25 15:27 UTC] No.44319579[source]▶

>>44319472 #

> English is a terrible language for deterministic outcomes in complex/complicated systems

I think that you seem to be under the impression that Karpathy somehow alluded to or hinted at that in his talk, which indicates you haven't actually watched the talk, which makes your first point kind of weird.

I feel like one of the stronger points he made, was that you cannot treat the LLMs as something they're explicitly not, so why would anyone expect deterministic outcomes from them?

He's making the case for coding with LLMs, not letting the LLMs go by themselves writing code ("vibe coding"), and understanding how they work before attempting to do so.

replies(1): >>44319869 #

tudorizer ◴[19 Jun 25 15:58 UTC] No.44319869[source]▶

>>44319579 #

I watched the entire talk, quite carefully. He explicitly states how excited he was about his tweet mentioning English.

The disclaimer you mention was indeed mentioned, although it's "in one ear, out the other" with most of his audience.

If I give you a glazed donut with a brief asterisk about how sugar can cause diabetes will it stop you from eating the donut?

You also expect deterministic outcomes when making analogies with power plants and fabs.

replies(3): >>44319978 #>>44320055 #>>44320091 #

pama ◴[19 Jun 25 16:21 UTC] No.44320091[source]▶

>>44319869 #

Your experience with fabs must be somewhat limited if you think that the state of the art in fabs produces deterministic results. Please lookup (or ask friends) for the typical yields and error mitigation features of modern chips and try to visualize if you think it is possible to have determinism when the density of circuits starts to approach levels that cannot be imspected with regular optical microscopes anymore. Modern chip fabrication is closer to LLM code in even more ways than what is presented in the video.

replies(2): >>44320233 #>>44320255 #

whilenot-dev ◴[19 Jun 25 16:37 UTC] No.44320233[source]▶

>>44320091 #

> Modern chip fabrication is closer to LLM code

As is, I don't quite understand what you're getting at here. Please just think that through and tell us what happens to the yield ratio when the software running on all those photolithography machines wouldn't be deterministic.

replies(1): >>44320774 #

kadushka ◴[19 Jun 25 17:32 UTC] No.44320774[source]▶

>>44320233 #

An output of a fab, just like an output of an LLM, is non-deterministic, but is good enough, or is being optimized to be good enough.

Non-determinism is not the problem, it's the quality of the software that matters. You can repeatedly ask me to solve a particular leetcode puzzle, and every time I might output a slightly different version. That's fine as long as the code solves the problem.

The software running on the machines (or anywhere) just needs to be better (choose your metric here) than the software written by humans. Software written by GPT-4 is better than software written by GPT-3.5, and the software written by o3 is better than software written by GPT-4. That's just the improvement from the last 3 years, and there's a massive, trillion-dollar effort worldwide to continue the progress.

replies(1): >>44321106 #

whilenot-dev ◴[19 Jun 25 18:14 UTC] No.44321106[source]▶

>>44320774 #

Hardware always involves some level of non-determinism, because the physical world is messier than the virtual software world. Every hardware engineer accepts that and learns how to design solutions despite those constraints. But you're right, non-determinism is not the current problem in some fabs, because the whole process has been modeled with it in mind, and it's the yield ratio that needs to be deterministic enough to offer a service. Remember the struggles in Intels fabs? Revenue reflects that at fabs.

The software quality at companies like ASML seems to be in a bad shape already, and I remember ex-employees stating that there are some team leads higher up who can at least reason about existing software procedures, their implementation, side effects and their outcomes. Do you think this software is as thoroughly documented as some open source project? The purchase costs for those machines are in the mid-3-digit million range (operating costs excluded) and are expected to run 24/7 to be somewhat worthwhile. Operators can handle hardware issues on the spot and work around them, but what do you think happens with downtime due to non-deterministic software issues?

replies(1): >>44327001 #

1. pama ◴[20 Jun 25 12:20 UTC] No.44327001[source]▶

>>44321106 #

The output of the verilog optimizer is different every time. The output of a fab is different in every batch. Each chip in a batch is different from others in that batch. Quality control drops the fraction of truly poor chips, and hardware design features might downgrade some of the partially failed chips to be classified as lesser versions of the same initial design. The final chips work as intended, mostly, but perhaps the error tolerance to overclocking or the mean time between failures is slightly different between chips. We can all work with them just fine almost all the time. The same principles apply to complex LLM-orchestrated code projects. I dont mind if my compiler gives different code each time because it uses a stochastic optimizer, but I want my code to do what I want and to not fail more than a certain tolerance I have for this code, which depends on the application. By giving more insight into the layers of testing to more people, and by encouraging the new documentation practices that Andrej mentioned, LLM coding will change the practice of software engineering rather dramatically. Code 2.0 was flexible and could yield results that were better than human coded efforts for complex problems, but the architecture, code, data, were selected by humans. In code 3.0 humans have access to (non-deterministic) building blocks that are written in natural language, to bug fixes and feature addition that happen in a conversation style. Similar engineering principles as with code 1.0 still apply (even more so than with code2.0, unless the product is a neural net), but the emphasis on verification increased dramatically as a fraction of the total effort, even though the total effort has gone down a lot. I can’t wait to see increased help in code verification efforts from this batch of people in the AI startup school as a result of Andrej’s presentation.

↑