My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)

(simonwillison.net)

Show context

AlexeyBrin ◴[29 Jul 25 14:02 UTC] No.44723521[source]▶

>>44723316 (OP) #

Most likely its training data included countless Space Invaders in various programming languages.

replies(6): >>44723664 #>>44723707 #>>44723945 #>>44724116 #>>44724439 #>>44724690 #

NitpickLawyer ◴[29 Jul 25 14:19 UTC] No.44723707[source]▶

>>44723521 #

This comment is ~3 years late. Every model since gpt3 has had the entirety of available code in their training data. That's not a gotcha anymore.

We went from chatgpt's "oh, look, it looks like python code but everything is wrong" to "here's a full stack boilerplate app that does what you asked and works in 0-shot" inside 2 years. That's the kicker. And the sauce isn't just in the training set, models now do post-training and RL and a bunch of other stuff to get to where we are. Not to mention the insane abilities with extended context (first models were 2/4k max), agentic stuff, and so on.

These kinds of comments are really missing the point.

replies(7): >>44723808 #>>44723897 #>>44724175 #>>44724204 #>>44724397 #>>44724433 #>>44729201 #

haar ◴[29 Jul 25 14:26 UTC] No.44723808[source]▶

>>44723707 #

I've had little success with Agentic coding, and what success I have had has been paired with hours of frustration, where I'd have been better off doing it myself for anything but the most basic tasks.

Even then, when you start to build up complexity within a codebase - the results have often been worse than "I'll start generating it all from scratch again, and include this as an addition to the initial longtail specification prompt as well", and even then... it's been a crapshoot.

I _want_ to like it. The times where it initially "just worked" felt magical and inspired me with the possibilities. That's what prompted me to get more engaged and use it more. The reality of doing so is just frustrating and wishing things _actually worked_ anywhere close to expectations.

replies(1): >>44724064 #

aschobel ◴[29 Jul 25 14:43 UTC] No.44724064[source]▶

>>44723808 #

Bingo, it's magical but the learning curve is very very steep. The METR study on open-source productivity alluded to this a bit.

I am definitely at a point where I am more productive with it, but it took a bunch of effort.

replies(2): >>44724470 #>>44724770 #

devmor ◴[29 Jul 25 15:18 UTC] No.44724470[source]▶

>>44724064 #

The subjects in the study you are referencing also believed that they were more productive with it. What metrics do you have to convince yourself you aren't under the same illusionary bias they were?

replies(1): >>44724497 #

simonw ◴[29 Jul 25 15:20 UTC] No.44724497[source]▶

>>44724470 #

Yesterday I used ffmpeg to extract the frame at the 13 second mark of a video out as a JPEG.

If I didn't have an LLM to figure that out for me I wouldn't have done it at all.

replies(4): >>44724574 #>>44724628 #>>44724962 #>>44733418 #

devmor ◴[29 Jul 25 15:26 UTC] No.44724574[source]▶

>>44724497 #

You wouldn't have just typed "extract frame at timestamp as jpeg ffmpeg" into Google and used the StackExchange result that comes up first that gives you a command to do exactly that?

replies(1): >>44724615 #

simonw ◴[29 Jul 25 15:29 UTC] No.44724615[source]▶

>>44724574 #

Before LLMs made ffmpeg no-longer-frustrating-to-use I genuinely didn't know that ffmpeg COULD do things like that.

replies(1): >>44726857 #

1. devmor ◴[29 Jul 25 18:38 UTC] No.44726857[source]▶

>>44724615 #

I'm not really sure what you're saying an LLM did in this case. Inspired a lost sense of curiosity?

replies(3): >>44727355 #>>44727394 #>>44727490 #

2. Philpax ◴[29 Jul 25 19:30 UTC] No.44727355[source]▶

>>44726857 (TP) #

Translated a vague natural language query ("cli, extract frame 13s into video") into something immediately actionable with specific examples and explanations, surfacing information that I would otherwise not know how to search for.

That's what I've done with my ffmpeg LLM queries, anyway - can't speak for simonw!

replies(1): >>44728128 #

3. 0x457 ◴[29 Jul 25 19:34 UTC] No.44727394[source]▶

>>44726857 (TP) #

LLM somewhat understood ffmpeg documentation? Not sure what is not clear here.

4. simonw ◴[29 Jul 25 19:45 UTC] No.44727490[source]▶

>>44726857 (TP) #

My general point is that people say things like "yeah, but this one study showed that programmers over-estimate the productivity gain they get from LLMs so how can you really be sure?"

Meanwhile I've spent the past two years constantly building and implementing things I never would have done because of the reduction in friction LLM assistance gives me.

I wrote about this first two years ago - AI-enhanced development makes me more ambitious with my projects - https://simonwillison.net/2023/Mar/27/ai-enhanced-developmen... - when I realized I was hacking on things with tech like AppleScript and jq that I'd previously avoided.

It's hard to measure the productivity boost you get from "wouldn't have built that thing" to "actually built that thing".

replies(1): >>44739153 #

5. wizzwizz4 ◴[29 Jul 25 20:50 UTC] No.44728128[source]▶

>>44727355 #

DuckDuckGo search results for "cli, extract frame 13s into video" (no quotes):

• https://stackoverflow.com/questions/10957412/fastest-way-to-...

• https://superuser.com/questions/984850/linux-how-to-extract-...

• https://www.aleksandrhovhannisyan.com/notes/video-cli-cheat-...

• https://www.baeldung.com/linux/ffmpeg-extract-video-frames

• https://ottverse.com/extract-frames-using-ffmpeg-a-comprehen...

Search engines have been able to translate "vague natural language queries" into search results for a decade, now. This pre-existing infrastructure accounts for the vast majority of ChatGPT's apparent ability to find answers.

replies(1): >>44729497 #

6. stelonix ◴[29 Jul 25 23:40 UTC] No.44729497{3}[source]▶

>>44728128 #

Yet the interface is fundamentally different, the output feels much more like bro pages[0] and it's within a click of clipboarding, one CTRL V away from extracting the 13th second screenshot. I've been using Google the past 24 years and my google-fu has always left people amazed; yet I can no longer bother to go through Stack Exchange's results when an LLM not only spits it out so nicely, but also does the equivalent of a explainshell[1].

Not comparable and I fail to see why going through Google's ads/results would be better?

[0] https://github.com/pombadev/bropages

[1] https://github.com/idank/explainshell

replies(1): >>44731050 #

7. wizzwizz4 ◴[30 Jul 25 05:11 UTC] No.44731050{4}[source]▶

>>44729497 #

DuckDuckGo insists on shoving "AI Assist" entries in its results, so I have a reasonable idea of how often LLMs are completely wrong even given search results. The answer's still "more than one time in five".

I did not suggest using Google Search (the company's on record as deliberately making Google Search worse), but there are other search engines. My preferred search engines don't do the fancy "interpret natural language queries" pre-processing, because I'm quite good at doing that in my head and often want to research niche stuff, but there are many still-decent search engines that do, and don't have ads in the results.

Heck, you can even pay for a good search engine! And you can have it redirect you to the relevant section of the top search result automatically: Google used to call this "I'm feeling lucky!" (although it was before URI text fragments, so it would just send you to the top of the page). All the properties you're after, much more cheaply, and you keep the information about provenance, and your answer is more-reliably accurate.

replies(1): >>44736573 #

8. delian66 ◴[30 Jul 25 16:53 UTC] No.44736573{5}[source]▶

>>44731050 #

> Heck, you can even pay for a good search engine!

Can you recommend one?

9. aschobel ◴[30 Jul 25 20:31 UTC] No.44739153[source]▶

>>44727490 #

"You can just do things".

Agreed on all fronts. jq and AppleScript are a total syntax mystery to me, but now I use them all the times since claude code has figured them out.

It's so powerful knowing the shape of a solution on not having to care about the details.

↑