(qwenlm.github.io)

544 points tosh | 4 comments | 24 Mar 25 18:35 UTC | HN request time: 0.407s | source

Show context

gatienboquet ◴[24 Mar 25 19:08 UTC] No.43464396[source]▶

>>43464068 (OP) #

So today is Qwen. Tomorrow a new SOTA model from Google apparently, R2 next week.

We haven't hit the wall yet.

replies(6): >>43464672 #>>43464706 #>>43464975 #>>43465234 #>>43465549 #>>43472639 #

1. OsrsNeedsf2P ◴[24 Mar 25 20:45 UTC] No.43465234[source]▶

>>43464396 #

> We haven't hit the wall yet.

The models are iterative improvements, but I haven't seen night and day differences since GPT3 and 3.5

replies(3): >>43465478 #>>43467288 #>>43468261 #

2. anon373839 ◴[24 Mar 25 21:14 UTC] No.43465478[source]▶

>>43465234 (TP) #

Yeah. Scaling up pretraining and huge models appears to be done. But I think we're still advancing the frontier in the other direction -- i.e., how much capability and knowledge can we cram into smaller and smaller models?

3. Davidzheng ◴[25 Mar 25 01:37 UTC] No.43467288[source]▶

>>43465234 (TP) #

Tbh such a big jump from current capability would be ASI already

4. YetAnotherNick ◴[25 Mar 25 05:05 UTC] No.43468261[source]▶

>>43465234 (TP) #

Because 3.5 has a new capability which is following instructions. Right now we are in 3.5 range in conversation AI and native image generation, both of which feels magical.

↑

Qwen2.5-VL-32B: Smarter and Lighter