My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)

(simonwillison.net)

577 points simonw | 1 comments | 29 Jul 25 13:45 UTC | HN request time: 0s | source

Show context

AlexeyBrin ◴[29 Jul 25 14:02 UTC] No.44723521[source]▶

>>44723316 (OP) #

Most likely its training data included countless Space Invaders in various programming languages.

replies(6): >>44723664 #>>44723707 #>>44723945 #>>44724116 #>>44724439 #>>44724690 #

NitpickLawyer ◴[29 Jul 25 14:19 UTC] No.44723707[source]▶

>>44723521 #

This comment is ~3 years late. Every model since gpt3 has had the entirety of available code in their training data. That's not a gotcha anymore.

We went from chatgpt's "oh, look, it looks like python code but everything is wrong" to "here's a full stack boilerplate app that does what you asked and works in 0-shot" inside 2 years. That's the kicker. And the sauce isn't just in the training set, models now do post-training and RL and a bunch of other stuff to get to where we are. Not to mention the insane abilities with extended context (first models were 2/4k max), agentic stuff, and so on.

These kinds of comments are really missing the point.

replies(7): >>44723808 #>>44723897 #>>44724175 #>>44724204 #>>44724397 #>>44724433 #>>44729201 #

jan_Sate ◴[29 Jul 25 14:56 UTC] No.44724204[source]▶

>>44723707 #

Not exactly. The real utility value of LLM for programming is to come up with something new. For Space Invaders, instead of using LLM for that, I might as well just manually search for the code online and use that.

To show that LLM actually can provide value for one-shot programming, you need to find a problem that there's no fully working sample code available online. I'm not trying to say that LLM couldn't to that. But just because LLM can come up with a perfectly-working Space Invaders doesn't mean that it could do that.

replies(2): >>44724519 #>>44724841 #

devmor ◴[29 Jul 25 15:22 UTC] No.44724519[source]▶

>>44724204 #

> The real utility value of LLM for programming is to come up with something new.

That's the goal for these projects anyways. I don't know that its true or feasible. I find the RAG models much more interesting myself, I see the technology as having far more value in search than generation.

Rather than write some markov-chain reminiscent frankenstein function when I ask it how to solve a problem, I would like to see it direct me to the original sources it would use to build those tokens, so that I can see their implementations in context and use my judgement.

replies(1): >>44724556 #

simonw ◴[29 Jul 25 15:24 UTC] No.44724556[source]▶

>>44724519 #

"I would like to see it direct me to the original sources it would use to build those tokens"

Sadly that's not feasible with transformer-based LLMs: those original sources are long gone by the time you actually get to use the model, scrambled a billion times into a trained set of weights.

One thing that helped me understand this is understanding that every single token output by an LLM is the result of a calculation that considers all X billion parameters that are baked into that model (or a subset of that in the case of MoE models, but it's still billions of floating point calculations for every token.)

You can get an imitation of that if you tell the model "use your search tool and find example code for this problem and build new code based on that", but that's a pretty unconventional way to use a model. A key component of the value of these things is that they can spit out completely new code based on the statistical patterns they learned through training.

replies(1): >>44724604 #

devmor ◴[29 Jul 25 15:28 UTC] No.44724604[source]▶

>>44724556 #

I am aware, and that's exactly why I don't think they're anywhere near as useful for this type of work as the people pushing them want them to be.

I tried to push for this type of model when an org I worked with over a decade ago was first exploring using the first generation of Tensorflow to drive customer service chatbots and was sadly ignored.

replies(1): >>44724629 #

simonw ◴[29 Jul 25 15:31 UTC] No.44724629[source]▶

>>44724604 #

I don't understand. For code, why would I want to remix existing code snippets?

I totally get the value of RAG style patterns for information retrieval against factual information - for those I don't want the LLM to answer my question directly, I want it to run a search and show me a citation and directly quote a credible source as part of answering.

For code I just want code that works - I can test it myself to make sure it does what it's supposed to.

replies(1): >>44724850 #

devmor ◴[29 Jul 25 15:49 UTC] No.44724850[source]▶

>>44724629 #

> I don't understand. For code, why would I want to remix existing code snippets?

That is what you're doing already. You're just relying on a vector compression and search engine to hide it from you and hoping the output is what you expect, instead of having it direct you to where it remixed those snippets from so you can see how they work to start with and make sure its properly implemented from the get-go.

We all want code that works, but understanding that code is a critical part of that for anything but a throw-away one time use script.

I don't really get this desire to replace critical thought with hoping and testing. It sounds like the pipe dream of a middle manager, not a tool for a programmer.

replies(1): >>44725565 #

stavros ◴[29 Jul 25 16:43 UTC] No.44725565[source]▶

>>44724850 #

I don't understand your point. You seem to be saying that we should be getting code from the source, then adapting it to our project ourselves, instead of getting adapted code to begin with.

I'm going to review the code anyway, why would I not want to save myself some of the work? I can "see how they work" after the LLM gives them to me just fine.

replies(1): >>44726844 #

devmor ◴[29 Jul 25 18:37 UTC] No.44726844[source]▶

>>44725565 #

The work that you are "saving" is the work of using your brain to determine the solution to the problem. Whatever the LLM gives you doesn't have a context it is used in other than your prompt - you don't even know what it does until after you evaluate it.

If you instead have a set of sources related to your problem, they immediately come with context, usage and in many cases, developer notes and even change history to show you mistakes and adaptations.

You're ultimately creating more work for yourself* by trying to avoid work, and possibly ending up with an inferior solution in the process. Where is your sense of efficiency? Where is your pride as a intellectual?

* Yes, you are most likely creating more work for yourself even if you think you are capable of telling otherwise. [1]

1. https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

replies(2): >>44726914 #>>44727149 #

1. stavros ◴[29 Jul 25 18:44 UTC] No.44726914[source]▶

>>44726844 #

Thanks for the concern, but I'm perfectly able to judge for myself whether I'm creating more work or delivering an inferior product.

↑