Most active commenters

FeepingCreature(3)

Popular/hot comments

>>44724137 #
>>44726179 #

←back to thread

My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)

(simonwillison.net)

Show context

AlexeyBrin ◴[29 Jul 25 14:02 UTC] No.44723521[source]▶

>>44723316 (OP) #

Most likely its training data included countless Space Invaders in various programming languages.

replies(6): >>44723664 #>>44723707 #>>44723945 #>>44724116 #>>44724439 #>>44724690 #

quantumHazer ◴[29 Jul 25 14:15 UTC] No.44723664[source]▶

>>44723521 #

and probably some synthetic data are generated copy of the games already on the dataset?

i have this feeling with LLM's generated react frontend, they all look the same

replies(4): >>44723867 #>>44724566 #>>44724902 #>>44731430 #

1. bayindirh ◴[29 Jul 25 14:29 UTC] No.44723867[source]▶

>>44723664 #

Last time somebody asked for a "premium camera app for iOS", and the model (re)generated Halide.

Models don't emit something they don't know. They remix and rewrite what they know. There's no invention, just recall...

replies(4): >>44724102 #>>44724181 #>>44724845 #>>44726775 #

2. FeepingCreature ◴[29 Jul 25 14:47 UTC] No.44724102[source]▶

>>44723867 (TP) #

True where trivial; where nontrivial, false.

Trivially, humans don't emit something they don't know either. You don't spontaneously figure out Javascript from first principles, you put together your existing knowledge into new shapes.

Nontrivially, LLMs can absolutely produce code for entirely new requirements. I've seen them do it many times. Will it be put together from smaller fragments? Yes, this is called "experience" or if the fragments are small enough, "understanding".

replies(2): >>44724137 #>>44724530 #

3. bayindirh ◴[29 Jul 25 14:50 UTC] No.44724137[source]▶

>>44724102 #

Humans can observe ants and invent any colony optimization. AIs can’t.

Humans can explore what they don’t know. AIs can’t.

replies(5): >>44724200 #>>44724373 #>>44724567 #>>44724658 #>>44731957 #

4. satvikpendem ◴[29 Jul 25 14:54 UTC] No.44724181[source]▶

>>44723867 (TP) #

This doesn't make sense thermodynamically because models are far smaller than the training data they purport to hold and recall, so there must be some level of "understanding" going on. Whether that's the same as human understanding is a different matter.

replies(1): >>44726179 #

5. falcor84 ◴[29 Jul 25 14:56 UTC] No.44724200{3}[source]▶

>>44724137 #

What makes you categorically say that "AIs can't"?

Based on my experience with present day AIs, I personally wouldn't be surprised at all that if you showed Gemini 2.5 Pro a video of an insect colony and asked it "Take a look at the way they organize and see if that gives you inspiration for an optimization algorithm", it will spit something interesting out.

replies(1): >>44725223 #

6. FeepingCreature ◴[29 Jul 25 15:10 UTC] No.44724373{3}[source]▶

>>44724137 #

What makes you categorically say that "humans can"?

I couldn't do that with an ant colony. I would have to train on ant research first.

(Oh, and AIs can absolutely explore what they don't know. Watch a Claude Code instance look at a new repository. Exploration is a convergent skill in long-horizon RL.)

7. phkahler ◴[29 Jul 25 15:23 UTC] No.44724530[source]▶

>>44724102 #

>> Nontrivially, LLMs can absolutely produce code for entirely new requirements. I've seen them do it many times.

I think most people writing software today are reinventing a wheel, even in corporate environments for internal tools. Everyone wants their own tweak or thinks their idea is unique and nobody wants to share code publicly, so everyone pays programmers to develop buggy bespoke custom versions of the same stuff that's been done 100 times before.

I guess what I'm saying is that your requirements are probably not new, and to the extent they are yes an LLM can fill in the blanks due to its fluency in languages.

replies(1): >>44743498 #

8. CamperBob2 ◴[29 Jul 25 15:25 UTC] No.44724567{3}[source]▶

>>44724137 #

That's what benchmarks like ARC-AGI are designed to test. The models are getting better at it, and you aren't.

Nothing ultimately matters in this business except the first couple of time derivatives.

9. ben_w ◴[29 Jul 25 15:33 UTC] No.44724658{3}[source]▶

>>44724137 #

> Humans can observe ants and invent any colony optimization. AIs can’t.

Surely this is exactly what current AI do? Observe stuff and apply that observation? Isn't this the exact criticism, that they aren't inventing ant colonies from first principles without ever seeing one?

> Humans can explore what they don’t know. AIs can’t.

We only learned to decode Egyptian hieroglyphs because of the Rosetta Stone. There's no translation for North Sentinelese, the Voynich manuscript, or Linear A.

We're not magic.

10. Uehreka ◴[29 Jul 25 15:48 UTC] No.44724845[source]▶

>>44723867 (TP) #

> Models don't emit something they don't know. They remix and rewrite what they know. There's no invention, just recall...

People really need to stop saying this. I get that it was the Smart Guy Thing To Say in 2023, but by this point it’s pretty clear that that it’s not true in any way that matters for most practical purposes.

Coding LLMs have clearly been trained on conversations where a piece of code is shown, a transformation is requested (rewrite this from Python to Go), and then the transformed code is shown. It’s not that they’re just learning codebases, they’re learning what working with code looks like.

Thus you can ask an LLM to refactor a program in a language it has never seen, and it will “know” what refactoring means, because it has seen it done many times, and it will stand a good chance of doing the right thing.

That’s why they’re useful. They’re doing something way more sophisticated than just “recombining codebases from their training data”, and anyone chirping 2023 sound bites is going to miss that.

replies(2): >>44731840 #>>44739406 #

11. sarchertech ◴[29 Jul 25 16:18 UTC] No.44725223{4}[source]▶

>>44724200 #

It will 100% have something in its training set discussing a human doing this and will almost definitely spit out something similar.

replies(1): >>44732015 #

12. Eggpants ◴[29 Jul 25 17:37 UTC] No.44726179[source]▶

>>44724181 #

It’s a lossy text compression technique. It’s clever applied statistics. Basically an advanced association rules algorithm which has been around for decades but modified to consider order and relative positions.

There is no understanding, regardless of the wants of all the capital investors in this domain.

replies(3): >>44726653 #>>44726720 #>>44728418 #

13. simonw ◴[29 Jul 25 18:18 UTC] No.44726653{3}[source]▶

>>44726179 #

I don't care if it can "understand" anything, as long as I can use it to achieve useful things.

replies(1): >>44726747 #

14. ◴[29 Jul 25 18:23 UTC] No.44726720{3}[source]▶

>>44726179 #

15. Eggpants ◴[29 Jul 25 18:26 UTC] No.44726747{4}[source]▶

>>44726653 #

“useful things“ like poorly drawing birds on bikes? ;)

(I have much respect for what you have done and are currently doing, but you did walk right into that one)

replies(1): >>44729114 #

16. mr_toad ◴[29 Jul 25 18:29 UTC] No.44726775[source]▶

>>44723867 (TP) #

> They remix and rewrite what they know. There's no invention, just recall...

If they only recalled they wouldn’t “hallucinate”. What’s a lie if not an invention? So clearly they can come up with data that they weren’t trained on, for better or worse.

replies(1): >>44727316 #

17. 0x457 ◴[29 Jul 25 19:26 UTC] No.44727316[source]▶

>>44726775 #

Because internally, there isn't a difference between correctly "recalled" token and incorrectly (hallucinated).

replies(1): >>44734656 #

18. CamperBob2 ◴[29 Jul 25 21:24 UTC] No.44728418{3}[source]▶

>>44726179 #

It’s a lossy text compression technique.

That is a much, much bigger deal than you make it sound like.

Compression may, in fact, be all we need. For that matter, it may be all there is.

19. msephton ◴[29 Jul 25 22:43 UTC] No.44729114{5}[source]▶

>>44726747 #

The pelican on a bicycle is a very useful test.

replies(1): >>44733323 #

20. cztomsik ◴[30 Jul 25 07:58 UTC] No.44731840[source]▶

>>44724845 #

I don't know, I have mixed-bag experiences and it's not really improving. It greatly varies depending on the programming language and the kind of problem which I'm trying to solve.

The tasks where it works great are things I'd expect to be part of dataset (github, blog posts), or they are "classic" LM tasks (understand + copy-paste/patch). The actual intelligence, in my opinion, is still very limited. So while it's true it's not "just recall" it still might be "mostly recall".

BTW: Copy-paste is something which works great in any attention-based model. On the other hand, models like RWKV usually fail and are not suited for this IMHO (but I think they have much better potential for the AGI)

21. numpad0 ◴[30 Jul 25 08:19 UTC] No.44731957{3}[source]▶

>>44724137 #

humans also eat

22. fc417fc802 ◴[30 Jul 25 08:32 UTC] No.44732015{5}[source]▶

>>44725223 #

That's a good point but all it means is that we can't test the hypothesis one way or the other due to never being entirely certain that a given task isn't anywhere in the training data. Supposing that "AIs can't" is then just as invalid as supposing that "AIs can".

23. dfedbeef ◴[30 Jul 25 12:28 UTC] No.44733323{6}[source]▶

>>44729114 #

Yeah what if you need a drawing of a pelican on a bicycle

24. pbhjpbhj ◴[30 Jul 25 14:24 UTC] No.44734656{3}[source]▶

>>44727316 #

Depends on the training? If there was eg RLHF then those connections are stronger and more likely; that's a difference (but not a category difference).

replies(1): >>44759348 #

25. yencabulator ◴[30 Jul 25 20:57 UTC] No.44739406[source]▶

>>44724845 #

> It’s not that they’re just learning codebases, they’re learning what working with code looks like.

Working in any not-in-training-set environment very quickly shows the shortcomings of this belief.

For example, Cloudflare Workers is V8 but it sure ain't Node, and the local sqlite in a Durable Object has a sync API with very different guarantees than a typical client-server SQL setup.

Even in a more standard setting, it's really hard to even get an LLM to use the current-stable APIs when its training data contains now-deprecated examples. Your local rules, llms.txt mentions, corrections etc slip out of the context pretty fast and it goes back to trained data.

The LLM can perhaps "read any code" but it really really prefers writing only code that was in its training set.

26. FeepingCreature ◴[31 Jul 25 08:12 UTC] No.44743498{3}[source]▶

>>44724530 #

Nothing is truly and completely new. I'm not formulating my requirements in an extinct language. My point is "filling in the blanks" and "do new things" are a spectrum.

LLMs have their limits, but they really can understand and productively contribute to programs that achieve a purpose that no program on the internet has done yet. What they are doing is not interpolation at the highest level. It may be interpolation/extrapolation at a lower level, but this goes for any skill learnt by anyone ever.

27. 0x457 ◴[01 Aug 25 16:53 UTC] No.44759348{4}[source]▶

>>44734656 #

Yes, but I thought we're talking about category difference.

Proper RLHF surely boosts "predicted next token until it couldn't" to feel more like "actually recalled".

↑