My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)

(simonwillison.net)

577 points simonw | 1 comments | 29 Jul 25 13:45 UTC | HN request time: 0s | source

Show context

AlexeyBrin ◴[29 Jul 25 14:02 UTC] No.44723521[source]▶

>>44723316 (OP) #

Most likely its training data included countless Space Invaders in various programming languages.

replies(6): >>44723664 #>>44723707 #>>44723945 #>>44724116 #>>44724439 #>>44724690 #

NitpickLawyer ◴[29 Jul 25 14:19 UTC] No.44723707[source]▶

>>44723521 #

This comment is ~3 years late. Every model since gpt3 has had the entirety of available code in their training data. That's not a gotcha anymore.

We went from chatgpt's "oh, look, it looks like python code but everything is wrong" to "here's a full stack boilerplate app that does what you asked and works in 0-shot" inside 2 years. That's the kicker. And the sauce isn't just in the training set, models now do post-training and RL and a bunch of other stuff to get to where we are. Not to mention the insane abilities with extended context (first models were 2/4k max), agentic stuff, and so on.

These kinds of comments are really missing the point.

replies(7): >>44723808 #>>44723897 #>>44724175 #>>44724204 #>>44724397 #>>44724433 #>>44729201 #

haar ◴[29 Jul 25 14:26 UTC] No.44723808[source]▶

>>44723707 #

I've had little success with Agentic coding, and what success I have had has been paired with hours of frustration, where I'd have been better off doing it myself for anything but the most basic tasks.

Even then, when you start to build up complexity within a codebase - the results have often been worse than "I'll start generating it all from scratch again, and include this as an addition to the initial longtail specification prompt as well", and even then... it's been a crapshoot.

I _want_ to like it. The times where it initially "just worked" felt magical and inspired me with the possibilities. That's what prompted me to get more engaged and use it more. The reality of doing so is just frustrating and wishing things _actually worked_ anywhere close to expectations.

replies(1): >>44724064 #

aschobel ◴[29 Jul 25 14:43 UTC] No.44724064[source]▶

>>44723808 #

Bingo, it's magical but the learning curve is very very steep. The METR study on open-source productivity alluded to this a bit.

I am definitely at a point where I am more productive with it, but it took a bunch of effort.

replies(2): >>44724470 #>>44724770 #

devmor ◴[29 Jul 25 15:18 UTC] No.44724470[source]▶

>>44724064 #

The subjects in the study you are referencing also believed that they were more productive with it. What metrics do you have to convince yourself you aren't under the same illusionary bias they were?

replies(1): >>44724497 #

simonw ◴[29 Jul 25 15:20 UTC] No.44724497[source]▶

>>44724470 #

Yesterday I used ffmpeg to extract the frame at the 13 second mark of a video out as a JPEG.

If I didn't have an LLM to figure that out for me I wouldn't have done it at all.

replies(4): >>44724574 #>>44724628 #>>44724962 #>>44733418 #

throwworhtthrow ◴[29 Jul 25 15:58 UTC] No.44724962[source]▶

>>44724497 #

LLM's still give subpar results with ffmpeg. For example when I asked Sonnet to trim a long video with ffmpeg, it put the input file parameter before the start time parameter, which triggers an unnecessary decode of the video file. [1]

Sure, use the LLM to get over the initial hump. But ffmpeg's no exception to the rule that LLM's produce subpar code. It's worth spending a couple minutes reading the docs to understand what it did so you can do it better, and unassisted, next time.

[1] https://ffmpeg.org/ffmpeg.html#:~:text=ss%20position

replies(1): >>44725343 #

CamperBob2 ◴[29 Jul 25 16:27 UTC] No.44725343[source]▶

>>44724962 #

That says more about suboptimal design on ffmpeg's part than it does about the LLM. Most humans can't deal with ffmpeg command lines, so it's not surprising that the LLM misses a few tricks.

replies(1): >>44725912 #

nottorp ◴[29 Jul 25 17:13 UTC] No.44725912{3}[source]▶

>>44725343 #

Had a LLM generate 3 lines of working C++ code that was "only" one order of magnitude slower than what i edited the code to in 10 minutes.

If you're happy with results like that, sure, LLMs miss "a few tricks"...

replies(1): >>44726406 #

ben_w ◴[29 Jul 25 17:56 UTC] No.44726406{4}[source]▶

>>44725912 #

You don't have to leave LLM code alone, it's fine to change it — unless, I guess, you're doing some kind of LLM vibe-code-golfing?

But this does remind me of a previous co-worker. Wrote something to convert from a custom data store to a database, his version took 20 minutes on some inputs. Swore it couldn't possibly be improved. Obviously ridiculous because it didn't take 20 minutes to load from the old data store, nor to load from the new database. Over the next few hours of looking at very mediocre code, I realised it was doing an unnecessary O(n^2) check, confirmed with the CTO it wasn't business-critical, got rid of it, and the same conversion on the same data ran in something like 200ms.

Over a decade before LLMs.

replies(1): >>44726438 #

nottorp ◴[29 Jul 25 17:59 UTC] No.44726438{5}[source]▶

>>44726406 #

We all do that, sometimes where it’s time critical sometimes where it isn’t.

But I keep being told “AI” is the second coming of Ahura Mazda so it shouldn’t do stuff like that right?

replies(2): >>44726777 #>>44727506 #

ben_w ◴[29 Jul 25 19:47 UTC] No.44727506{6}[source]▶

>>44726438 #

> Ahura Mazda

Niche reference, I like it.

But… I only hear of scammers who say, and psychosis sufferers who think, LLMs are *already* that competent.

Future AI? Sure, lots of sane-seeming people also think it could go far beyond us. Special purpose ones have in very narrow domains. But current LLMs are only good enough to be useful and potentially economically disruptive, they're not even close to wildly superhuman like Stockfish is.

replies(1): >>44728116 #

CamperBob2 ◴[29 Jul 25 20:49 UTC] No.44728116{7}[source]▶

>>44727506 #

Sure. If you ask ChatGPT to play chess, it will put up an amateur-level effort at best. Stockfish will indeed wipe the floor with it. But what happens when you ask Stockfish to write a Space Invaders game?

ChatGPT will get better at chess over time. Stockfish will not get better at anything except chess. That's kind of a big difference.

replies(1): >>44728303 #

ben_w ◴[29 Jul 25 21:11 UTC] No.44728303{8}[source]▶

>>44728116 #

> ChatGPT will get better at chess over time

Oddly, LLMs got worse at specifically chess: https://dynomight.net/chess/

But even to the general point, there's absolutely no agreement how much better the current architectures can ultimately get, nor how quickly they can get there.

Do they have potential for unbounded improvements, albeit at exponential cost for each linear incremental improvement? Or will they asymptomatically approach someone with 5 years experience, 10 years experience, a lifetime of experience, or a higher level than any human?

If I had to bet, I'd say current models have an asymptomatic growth converging to a merely "ok" performance; and separately claim that even if they're actually unbounded with exponential cost for linear returns, we can't afford the training cost needed to make them act like someone with even just 6 years professional experience in any given subject.

Which is still a lot. Especially as it would be acting like it had about as much experience in every other subject at the same time. Just… not a literal Ahura Mazda.

replies(1): >>44728752 #

CamperBob2 ◴[29 Jul 25 21:59 UTC] No.44728752{9}[source]▶

>>44728303 #

If I had to bet, I'd say current models have an asymptomatic growth converging to a merely "ok" performance

(Shrug) People with actual money to spend are betting twelve figures that you're wrong.

Should be fun to watch it shake out from up here in the cheap seats.

replies(2): >>44728976 #>>44732191 #

1. ben_w ◴[29 Jul 25 22:26 UTC] No.44728976{10}[source]▶

>>44728752 #

Nah, trillion dollars is about right for "ok". Percentage point of the global economy in cost, automate 2 percent and get a huge margin. We literally set more than that on actual fire each year.

For "pretty good", it would be worth 14 figures, over two years. The global GDP is 14 figures. Even if this only automated 10% of the economy, it pays for itself after a decade.

For "Ahura Mazda", it would easily be worth 16 figures, what with that being the principal God and god of the sky in Zoroastrianism, and the only reason it stops at 16 is the implausibility of people staying organised for longer to get it done.

↑