Qwen2.5-VL-32B: Smarter and Lighter

1. gatienboquet ◴[24 Mar 25 19:08 UTC] No.43464396[source]▶

So today is Qwen. Tomorrow a new SOTA model from Google apparently, R2 next week.

We haven't hit the wall yet.

replies(6): >>43464672 #>>43464706 #>>43464975 #>>43465234 #>>43465549 #>>43472639 #

2. zamadatix ◴[24 Mar 25 19:39 UTC] No.43464672[source]▶

Qwen 3 is coming imminently as well https://github.com/huggingface/transformers/pull/36878 and it feels like Llama 4 should be coming in the next month or so.

That said none of the recent string of releases has done much yet to "smash a wall", they've just met the larger proprietary models where they already were. I'm hoping R2 or the like really changes that by showing ChatGPT 3->3.5 or 3.5->4 level generational jumps are still possible beyond the current state of the art, not just beyond current models of a given size.

replies(1): >>43468250 #

3. tomdekan ◴[24 Mar 25 19:44 UTC] No.43464706[source]▶

>>43464396 (TP) #

Any more info on the new Google model?

4. behnamoh ◴[24 Mar 25 20:15 UTC] No.43464975[source]▶

>>43464396 (TP) #

Google's announcements are mostly vaporware anyway. Btw, where is Gemini Ultra 1? how about Gemini Ultra 2?

replies(2): >>43465070 #>>43468100 #

5. karmasimida ◴[24 Mar 25 20:27 UTC] No.43465070[source]▶

>>43464975 #

It is already on the LLM arena right, codename nebula? But you are right they can fuck up their releases royally.

6. OsrsNeedsf2P ◴[24 Mar 25 20:45 UTC] No.43465234[source]▶

>>43464396 (TP) #

> We haven't hit the wall yet.

The models are iterative improvements, but I haven't seen night and day differences since GPT3 and 3.5

replies(3): >>43465478 #>>43467288 #>>43468261 #

7. anon373839 ◴[24 Mar 25 21:14 UTC] No.43465478[source]▶

>>43465234 #

Yeah. Scaling up pretraining and huge models appears to be done. But I think we're still advancing the frontier in the other direction -- i.e., how much capability and knowledge can we cram into smaller and smaller models?

8. nwienert ◴[24 Mar 25 21:25 UTC] No.43465549[source]▶

>>43464396 (TP) #

We've slid into the upper S curve though.

9. Davidzheng ◴[25 Mar 25 01:37 UTC] No.43467288[source]▶

>>43465234 #

Tbh such a big jump from current capability would be ASI already

10. aoeusnth1 ◴[25 Mar 25 04:20 UTC] No.43468100[source]▶

>>43464975 #

I guess they don’t do ultras anymore, but where was the announcement for it? What other announcement was vaporware?

11. YetAnotherNick ◴[25 Mar 25 05:03 UTC] No.43468250[source]▶

>>43464672 #

> met the larger proprietary models where they already were

This is smashing the wall.

Also if you just care about breaking absolute numbers, OpenAI released 4.5 a month back which is SOTA in base model, planning to release O3 full in maybe a month, and Deepseek released new V3 which is again SOTA in many aspects.

12. YetAnotherNick ◴[25 Mar 25 05:05 UTC] No.43468261[source]▶

>>43465234 #

Because 3.5 has a new capability which is following instructions. Right now we are in 3.5 range in conversation AI and native image generation, both of which feels magical.

13. intalentive ◴[25 Mar 25 15:39 UTC] No.43472639[source]▶

>>43464396 (TP) #

Asymptotic improvement will never hit the wall