My 2.5 year old laptop can write Space Invaders in JavaScript now (GLM-4.5 Air)

(simonwillison.net)

577 points simonw | 5 comments | 29 Jul 25 13:45 UTC | HN request time: 0.824s | source

Show context

NitpickLawyer ◴[29 Jul 25 14:03 UTC] No.44723522[source]▶

> Two years ago when I first tried LLaMA I never dreamed that the same laptop I was using then would one day be able to run models with capabilities as strong as what I’m seeing from GLM 4.5 Air—and Mistral 3.2 Small, and Gemma 3, and Qwen 3, and a host of other high quality models that have emerged over the past six months.

Yes, the open-models have surpassed my expectations in both quality and speed of release. For a bit of context, when chatgpt launched in Dec22, the "best" open models were GPT-J(~6-7B) and GPT-neoX (~22B?). I actually had an app running live, with users, using gpt-j for ~1 month. It was a pain. The quality was abysmal, there was no instruction following (you had to start your prompt like a story, or come up with a bunch of examples and hope the model will follow along) and so on.

And then something happened, LLama models got "leaked" (I still think it was a on purpose leak - don't sue us, we never meant to release, etc), and the rest is history. With L1 we got lots of optimisations like quantised models, fine-tuning and so on, L2 really saw fine-tuning go off (most of the fine-tunes were better than what meta released), we got alpaca showing off LoRA, and then a bunch of really strong models came out (mistrals, mixtrals, L3, gemmas, qwens, deepseeks, glms, granites, etc.)

By some estimations the open models are ~6mo behind what SotA labs have released. (note that doesn't mean the labs are releasing their best models, it's likely they keep those in house to use on next runs data curation, synthetic datasets, for distilling, etc). Being 6mo behind is NUTS! I never in my wildest dreams believed we'll be here. In fact I thought it would take ~2years to reach gpt3.5 levels. It's really something insane that we get to play with these models "locally", fine-tune them and so on.

replies(4): >>44723679 #>>44724534 #>>44726611 #>>44734796 #

genewitch ◴[29 Jul 25 15:23 UTC] No.44724534[source]▶

>>44723522 #

I'll bite. How do i train/make and/or use LoRA, or, separately, how do i fine-tune? I've been asking this for months, and no one has a decent answer. websearch on my end is seo/geo-spam, with no real instructions.

I know how to make an SD LoRA, and use it. I've known how to do that for 2 years. So what's the big secret about LLM LoRA?

replies(9): >>44724589 #>>44724702 #>>44724887 #>>44725233 #>>44725409 #>>44727383 #>>44727527 #>>44729225 #>>44731516 #

qcnguy ◴[29 Jul 25 16:31 UTC] No.44725409[source]▶

>>44724534 #

LLM fine tuning tends to destroy the model's capabilities if you aren't very careful. It's not as easy or effective as with image generation.

replies(2): >>44729336 #>>44732817 #

1. israrkhan ◴[29 Jul 25 23:13 UTC] No.44729336[source]▶

>>44725409 #

do you have a suggestion or a way to measure if model capabilities are getting destroyed? how do one measure it objectively?

replies(2): >>44729673 #>>44733183 #

2. RALaBarge ◴[30 Jul 25 00:05 UTC] No.44729673[source]▶

>>44729336 (TP) #

Ask it a series of the same questions after you train that you posed before training started. Is the quality lower?

replies(1): >>44731429 #

3. israrkhan ◴[30 Jul 25 06:38 UTC] No.44731429[source]▶

>>44729673 #

That series of questions will measure only a particular area. I am concerned about destorying model capabilities in some other area that that I do not pay attention to, and have no way of knowing.

replies(1): >>44731676 #

4. simonh ◴[30 Jul 25 07:26 UTC] No.44731676{3}[source]▶

>>44731429 #

Isn’t that a general problem with LLMs? The only way to know how good it is at something is to test it.

5. mensetmanusman ◴[30 Jul 25 12:14 UTC] No.44733183[source]▶

>>44729336 (TP) #

These are now questions at the cutting edge of academic research. It might be computationally unknowable until checked.

↑