(allenai.org)

361 points mseri | 1 comments | 21 Nov 25 06:50 UTC | HN request time: 0.267s | source

Show context

tcsenpai ◴[21 Nov 25 09:29 UTC] No.46002818[source]▶

I think they should start aiming for 20B models along with 32B and 7B. Usually 7B is enough for a 8GB GPU, 32B requires a 24GB GPU for decent quants (I can fit a 32B with IQ3_XXS but is not ideal) while 20-ish B models (such as magistral or gpt-oss) are a perfect fit for 16GB GPUs

replies(1): >>46004184 #

embedding-shape ◴[21 Nov 25 13:12 UTC] No.46004184[source]▶

>>46002818 #

Depends heavily on the architecture too, I think a free-for-all to find the better sizes is still kind of ongoing, and rightly so. GPT-OSS-120B for example fits in around 61GB VRAM for me when on MXFP4.

Personally, I hope GPU makers instead start adding more VRAM, or if one can dream, expandable VRAM.

replies(1): >>46004680 #

refulgentis ◴[21 Nov 25 14:00 UTC] No.46004680[source]▶

>>46004184 #

Unlikely to see more VRAM in the short term, memory prices are thru the roof :/ like, not subtly, 2-4x.

replies(1): >>46005014 #

embedding-shape ◴[21 Nov 25 14:38 UTC] No.46005014[source]▶

>>46004680 #

Well, GPUs are getting more VRAM, although it's pricey. But we didn't used to have 96GB VRAM GPUs at all, now they do exist :) But for the ones who can afford it, it is at least possible today. Slowly it increases.

replies(2): >>46005248 #>>46005320 #

1. ◴[21 Nov 25 15:05 UTC] No.46005248[source]▶

>>46005014 #

↑

Olmo 3: Charting a path through the model flow to lead open-source AI