(qwenlm.github.io)

544 points tosh | 1 comments | 24 Mar 25 18:35 UTC | HN request time: 0.2s | source

Show context

gatienboquet ◴[24 Mar 25 19:08 UTC] No.43464396[source]▶

>>43464068 (OP) #

So today is Qwen. Tomorrow a new SOTA model from Google apparently, R2 next week.

We haven't hit the wall yet.

replies(6): >>43464672 #>>43464706 #>>43464975 #>>43465234 #>>43465549 #>>43472639 #

OsrsNeedsf2P ◴[24 Mar 25 20:45 UTC] No.43465234[source]▶

>>43464396 #

> We haven't hit the wall yet.

The models are iterative improvements, but I haven't seen night and day differences since GPT3 and 3.5

replies(3): >>43465478 #>>43467288 #>>43468261 #

1. anon373839 ◴[24 Mar 25 21:14 UTC] No.43465478[source]▶

>>43465234 #

Yeah. Scaling up pretraining and huge models appears to be done. But I think we're still advancing the frontier in the other direction -- i.e., how much capability and knowledge can we cram into smaller and smaller models?

↑

Qwen2.5-VL-32B: Smarter and Lighter