Andrej Karpathy: Software in the era of AI [video]

(www.youtube.com)

1479 points sandslash | 1 comments | 19 Jun 25 00:33 UTC | HN request time: 0.217s | source

Show context

mentalgear ◴[19 Jun 25 09:33 UTC] No.44316934[source]▶

Meanwhile, I asked this morning Claude 4 to write a simple EXIF normalizer. After two rounds of prompting it to double-check its code, I still had to point out that it makes no sense to load the entire image for re-orientating if the EXIF orientation is fine in the first place.

Vibe vs reality, and anyone actually working in the space daily can attest how brittle these systems are.

Maybe this changes in SWE with more automated tests in verifiable simulators, but the real world is far to complex to simulate in its vastness.

replies(7): >>44317104 #>>44317116 #>>44317136 #>>44317214 #>>44317305 #>>44317622 #>>44317741 #

sensanaty ◴[19 Jun 25 10:41 UTC] No.44317305[source]▶

>>44316934 #

There's also those instances where Microsoft unleashed Copilot on the .NET repo, and it resulted in the most hilariously terrible PRs that required the maintainers to basically tell Copilot every single step it should take to fix the issue. They were basically writing the PRs themselves at that point, except doing it through an intermediary that was much dumber, slower and less practical than them.

And don't get me started on my own experiences with these things, and no, I'm not a luddite, I've tried my damndest and have followed all the cutting-edge advice you see posted on HN and elsewhere.

Time and time again, the reality of these tools falls flat on their face while people like Andrej hype things up as if we're 5 minutes away from having Claude become Skynet or whatever, or as he puts it, before we enter the world of "Software 3.0" (coincidentally totally unrelated to Web 3.0 and the grift we had to endure there, I'm sure).

To intercept the common arguments,

- no I'm not saying LLMs are useless or have no usecases

- yes there's a possibility if you extrapolate by current trends (https://xkcd.com/605/) that they indeed will be Skynet

- yes I've tried the latest and greatest model released 7 minutes ago to the best of my ability

- yes I've tried giving it prompts so detailed a literal infant could follow along and accomplish the task

- yes I've fiddled with providing it more/less context

- yes I've tried keeping it to a single chat rather than multiple chats, as well as vice versa

- yes I've tried Claude Code, Gemini Pro 2.5 With Deep Research, Roocode, Cursor, Junie, etc.

- yes I've tried having 50 different "agents" running and only choosing the best output form the lot.

I'm sure there's a new gotcha being written up as we speak, probably something along the lines of "Well for me it doubled my productivity!" and that's great, I'm genuinely happy for you if that's the case, but for me and my team who have been trying diligently to use these tools for anything that wasn't a microscopic toy project, it has fallen apart time and time again.

The idea of an application UI or god forbid an entire fucking Operating System being run via these bullshit generators is just laughable to me, it's like I'm living on a different planet.

replies(5): >>44317421 #>>44317440 #>>44317630 #>>44317721 #>>44318531 #

1. ffsm8 ◴[19 Jun 25 11:04 UTC] No.44317440[source]▶

>>44317305 #

Unironically, your comment mirrors my opinion as of last month.

Since then I've given it another try last week and was quite literally mind blown how much it improved in the context of Vibe coding (Claude code). It actually improved so much that I thought "I would like to try that on my production codebase", (mostly because I want if to fail, because that's my job ffs) but alas - that's not allowed at my dayjob.

From the limited experience I could gather over the last week as a software dev with over 10 yrs of experience (along with another 5-10 doing it as a hobby before employment) I can say that I expect our industry to get absolutely destroyed within the next 5 yrs.

The skill ceiling for devs is going to get mostly squashed for 90% of devs, this will inevitably destroy our collective bargaining positions. Including for the last 10%, because the competition around these positions will be even more fierce.

It's already starting, even if it's currently very misguided and mostly down to short-sightedness.

But considering the trajectory and looking at how naive current llms coding tools are... Once the industry adjusts and better tooling is pioneered... it's gonna get brutal.

And most certainly not limited to software engineering. Pretty much all desk jobs will get hemorrhaged as soon as a llm-player basically replaces SAP with entirely new tooling.

Frankly, I expect this to go bad, very very quickly. But I'm still hoping for a good ending.

↑