My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)

(simonwillison.net)

577 points simonw | 1 comments | 29 Jul 25 13:45 UTC | HN request time: 0s | source

Show context

bgwalter ◴[29 Jul 25 16:00 UTC] No.44724997[source]▶

The GML-4.5 model utterly fails at creating ASCII art or factorizing numbers. It can "write" Space Invaders because there are literally thousands of open source projects out there.

This is another example of LLMs being dumb copiers that do understand human prompts.

But there is one positive side to this: If this photocopying business can be run locally, the stocks of OpenAI etc. should got to zero.

replies(1): >>44725037 #

simonw ◴[29 Jul 25 16:03 UTC] No.44725037[source]▶

>>44724997 #

Why would you use an LLM to factorize numbers?

replies(1): >>44725141 #

bgwalter ◴[29 Jul 25 16:12 UTC] No.44725141[source]▶

>>44725037 #

Because we are told that they can solve IMO problems. Yet they fail at basic math problems, not only at factorization but also when probing them with relatively basic symbolic math that would not require the invocation of an external program.

Also, you know it they fail they could say so instead of giving a hallucinated answer. First the models lie and say that a 20 digit number takes vast amounts of computing. Then, if pointed to a factorization program they pretend to execute it and lie about the output.

There is no intelligence or flexibility apart from stealing other people's open source code.

replies(1): >>44725261 #

simonw ◴[29 Jul 25 16:21 UTC] No.44725261[source]▶

>>44725141 #

That's why the IMO results were so notable: that was one of those moments where new models were demonstrated doing something that they had previously been unable to do.

replies(2): >>44725609 #>>44728194 #

bgwalter ◴[29 Jul 25 20:58 UTC] No.44728194[source]▶

>>44725261 #

The results were private and the methodology was not revealed. Even Tao, who was bullish on "AI", is starting to question the process.

replies(1): >>44728474 #

1. simonw ◴[29 Jul 25 21:29 UTC] No.44728474[source]▶

>>44728194 #

The same thing has also been achieved by a Google DeepMind team and at least one group of independent researchers using publicly available models and careful promoting tricks.

↑