Most active commenters

abdullin(4)
koakuma-chan(3)

Andrej Karpathy: Software in the era of AI [video]

(www.youtube.com)

Show context

abdullin ◴[19 Jun 25 07:03 UTC] No.44316210[source]▶

Tight feedback loops are the key in working productively with software. I see that in codebases up to 700k lines of code (legacy 30yo 4GL ERP systems).

The best part is that AI-driven systems are fine with running even more tight loops than what a sane human would tolerate.

Eg. running full linting, testing and E2E/simulation suite after any minor change. Or generating 4 versions of PR for the same task so that the human could just pick the best one.

replies(7): >>44316306 #>>44316946 #>>44317531 #>>44317792 #>>44318080 #>>44318246 #>>44318794 #

latexr ◴[19 Jun 25 11:54 UTC] No.44317792[source]▶

>>44316210 #

> Or generating 4 versions of PR for the same task so that the human could just pick the best one.

That sounds awful. A truly terrible and demotivating way to work and produce anything of real quality. Why are we doing this to ourselves and embracing it?

A few years ago, it would have been seen as a joke to say “the future of software development will be to have a million monkey interns banging on one million keyboards and submit a million PRs, then choose one”. Today, it’s lauded as a brilliant business and cost-saving idea.

We’re beyond doomed. The first major catastrophe caused by sloppy AI code can’t come soon enough. The sooner it happens, the better chance we have to self-correct.

replies(6): >>44317876 #>>44317884 #>>44317997 #>>44318175 #>>44318235 #>>44318625 #

koakuma-chan ◴[19 Jun 25 12:25 UTC] No.44317997[source]▶

>>44317792 #

> That sounds awful. A truly terrible and demotivating way to work and produce anything of real quality

This is the right way to work with generative AI, and it already is an extremely common and established practice when working with image generation.

replies(3): >>44318041 #>>44318110 #>>44318310 #

1. notTooFarGone ◴[19 Jun 25 12:32 UTC] No.44318041[source]▶

>>44317997 #

I can recognize images in one look.

How about that 400 Line change that touches 7 files?

replies(3): >>44318098 #>>44318227 #>>44318814 #

2. koakuma-chan ◴[19 Jun 25 12:38 UTC] No.44318098[source]▶

>>44318041 (TP) #

In my prompt I ask the LLM to write a short summary of how it solved the problem, run multiple instances of LLM concurrently, compare their summaries, and use the output of whichever LLM seems to have interpreted instructions the best, or arrived at the best solution.

replies(1): >>44318584 #

3. abdullin ◴[19 Jun 25 12:52 UTC] No.44318227[source]▶

>>44318041 (TP) #

Exactly!

This is why there has to be "write me a detailed implementation plan" step in between. Which files is it going to change, how, what are the gotchas, which tests will be affected or added etc.

It is easier to review one document and point out missing bits, than chase the loose ends.

Once the plan is done and good, it is usually a smooth path to the PR.

replies(1): >>44318795 #

4. elt895 ◴[19 Jun 25 13:38 UTC] No.44318584[source]▶

>>44318098 #

And you trust that the summary matches what was actually done? Your experience with the level of LLMs understanding of code changes must significantly differ from mine.

replies(1): >>44318628 #

5. koakuma-chan ◴[19 Jun 25 13:43 UTC] No.44318628{3}[source]▶

>>44318584 #

It matched every time so far.

6. bayindirh ◴[19 Jun 25 14:04 UTC] No.44318795[source]▶

>>44318227 #

So you can create a more buggy code remixed from scraped bits from the internet which you don't understand, but somehow works rather than creating a higher quality, tighter code which takes the same amount of time to type? All the while offloading all the work to something else so your skills can atrophy at the same time?

Sounds like progress to me.

replies(1): >>44322806 #

7. mistersquid ◴[19 Jun 25 14:07 UTC] No.44318814[source]▶

>>44318041 (TP) #

> I can recognize images in one look.

> How about that 400 Line change that touches 7 files?

Karpathy discusses this discrepancy. In his estimation LLMs currently do not have a UI comparable to 1970s CLI. Today, LLMs output text and text does not leverage the human brain’s ability to ingest visually coded information, literally, at a glance.

Karpathy surmises UIs for LLMs are coming and I suspect he’s correct.

replies(1): >>44319905 #

8. variadix ◴[19 Jun 25 16:02 UTC] No.44319905[source]▶

>>44318814 #

The thing required isn’t a GUI for LLMs, it’s a visual model of code that captures all the behavior and is a useful representation to a human. People have floated this idea before LLMs, but as far as I know there isn’t any real progress, probably because it isn’t feasible. There’s so much intricacy and detail in software (and getting it even slightly wrong can be catastrophic), any representation that can capture said detail isn’t going to be interpretable at a glance.

replies(2): >>44320927 #>>44322430 #

9. mistersquid ◴[19 Jun 25 17:52 UTC] No.44320927{3}[source]▶

>>44319905 #

> The thing required isn’t a GUI for LLMs, it’s a visual model of code that captures all the behavior and is a useful representation to a human.

The visual representation that would be useful to humans is what Karpathy means by “GUI for LLMs”.

10. skydhash ◴[19 Jun 25 20:50 UTC] No.44322430{3}[source]▶

>>44319905 #

There’s no visual model for code as code isn’t 2d. There’s 2 mechanism in the turing machine model: a state machine and a linear representation of code and data. The 2d representation of state machine has no significance and the linear aspect of code and data is hiding more dimensions. We invented more abstractions, but nothing that map to a visual representation.

11. abdullin ◴[19 Jun 25 21:46 UTC] No.44322806{3}[source]▶

>>44318795 #

Here is another way to look at the problem.

There is a team of 5 people that are passionate about their indigenous language and want to preserve it from disappearing. They are using AI+Coding tools to:

(1) Process and prepare a ton of various datasets for training custom text-to-speech, speech-to-text models and wake word models (because foundational models don't know this language), along with the pipelines and tooling for the contributors.

(2) design and develop an embedded device (running ESP32-S3) to act as a smart speaker running on the edge

(3) design and develop backend in golang to orchestrate hundreds of these speakers

(4) a whole bunch of Python agents (essentially glorified RAGs over folklore, stories)

(5) a set of websites for teachers to create course content and exercises, making them available to these edge devices

All that, just so that kids in a few hundred kindergartens and schools would be able to practice their own native language, listen to fairy tales, songs or ask questions.

This project was acknowledged by the UN (AI for Good programme). They are now extending their help to more disappearing languages.

None of that was possible before. This sounds like a good progress to me.

Edit: added newlines.

replies(1): >>44325990 #

12. bayindirh ◴[20 Jun 25 09:27 UTC] No.44325990{4}[source]▶

>>44322806 #

What you are describing is another application. My comment was squarely aimed at "vibe coding".

Protecting and preserving dying languages and culture is a great application for natural language processing.

For the record, I'm neither against LLMs, nor AI. What I'm primarily against is, how LLMs are trained and use the internet via their agents, without giving any citations, and stripping this information left and right and cry "fair use!" in the process.

Also, Go and Python are a nice languages (which I use), but there are other nice ways to build agents which also allows them to migrate, communicate and work in other cooperative or competitive ways.

So, AI is nice, LLMs are cool, but hyping something to earn money, deskill people, and pointing to something which is ethically questionable and technically inferior as the only silver bullet is not.

IOW; We should handle this thing way more carefully and stop ripping people's work in the name of "fair use" without consent. This is nuts.

Disclosure: I'm a HPC sysadmin sitting on top of a datacenter which runs some AI workloads, too.

replies(1): >>44336695 #

13. abdullin ◴[21 Jun 25 11:34 UTC] No.44336695{5}[source]▶

>>44325990 #

I think there are two different layers that get frequently mixed.

(1) LLMs as models - just the weights and an inference engine. These are just tools like hammers. There is a wide variety of models, starting from transparent and useless IBM Granite models, to open-weights Llama/Qwen to proprietary.

(2) AI products that are built on top of LLMs (agents, RAG, search, reasoning etc). This is how people decide to use LLMs.

How these products display results - with or without citations, with or without attribution - is determined by the product design.

It takes more effort to design a system that properly attributes all bits of information to the sources, but it is doable. As long as product teams are willing to invest that effort.

↑