Olmo 3: Charting a path through the model flow to lead open-source AI

(allenai.org)

Show context

tcsenpai ◴[21 Nov 25 09:29 UTC] No.46002818[source]▶

I think they should start aiming for 20B models along with 32B and 7B. Usually 7B is enough for a 8GB GPU, 32B requires a 24GB GPU for decent quants (I can fit a 32B with IQ3_XXS but is not ideal) while 20-ish B models (such as magistral or gpt-oss) are a perfect fit for 16GB GPUs

replies(1): >>46004184 #

embedding-shape ◴[21 Nov 25 13:12 UTC] No.46004184[source]▶

>>46002818 #

Depends heavily on the architecture too, I think a free-for-all to find the better sizes is still kind of ongoing, and rightly so. GPT-OSS-120B for example fits in around 61GB VRAM for me when on MXFP4.

Personally, I hope GPU makers instead start adding more VRAM, or if one can dream, expandable VRAM.

replies(1): >>46004680 #

refulgentis ◴[21 Nov 25 14:00 UTC] No.46004680[source]▶

>>46004184 #

Unlikely to see more VRAM in the short term, memory prices are thru the roof :/ like, not subtly, 2-4x.

replies(1): >>46005014 #

1. embedding-shape ◴[21 Nov 25 14:38 UTC] No.46005014[source]▶

>>46004680 #

Well, GPUs are getting more VRAM, although it's pricey. But we didn't used to have 96GB VRAM GPUs at all, now they do exist :) But for the ones who can afford it, it is at least possible today. Slowly it increases.

replies(2): >>46005248 #>>46005320 #

2. ◴[21 Nov 25 15:05 UTC] No.46005248[source]▶

>>46005014 (TP) #

3. refulgentis ◴[21 Nov 25 15:14 UTC] No.46005320[source]▶

>>46005014 (TP) #

Agreed, in the limit, RAM go up. As billg knows, 128KB definitely wasn't enough for everyone :)

replies(1): >>46005500 #

4. embedding-shape ◴[21 Nov 25 15:36 UTC] No.46005500[source]▶

>>46005320 #

I'm already thinking 96GB might not be enough, and I've only had this GPU for 6 months or so :|

replies(1): >>46007184 #

5. refulgentis ◴[21 Nov 25 18:22 UTC] No.46007184{3}[source]▶

>>46005500 #

Hehe me too…went all out on a MBP in 2022, did it again in April. Only upgrade I didn’t bother with was topping out at 128 GB of RAM instead of 64. Then, GPT-OSS 120B comes out and quickly makes me very sad I can’t use it locally

replies(1): >>46010341 #

6. anon373839 ◴[21 Nov 25 23:29 UTC] No.46010341{4}[source]▶

>>46007184 #

Same. I repeatedly kick myself for not getting the 128GB version, although not for the GPT-OSS model because I really haven’t been too impressed with it (through cloud providers). But now it’s best to wait until the M5 Max is out due to the new GPU neural accelerators that should greatly speed up prompt processing.

↑