←back to thread

361 points mseri | 1 comments | | HN request time: 0.267s | source
Show context
tcsenpai ◴[] No.46002818[source]
I think they should start aiming for 20B models along with 32B and 7B. Usually 7B is enough for a 8GB GPU, 32B requires a 24GB GPU for decent quants (I can fit a 32B with IQ3_XXS but is not ideal) while 20-ish B models (such as magistral or gpt-oss) are a perfect fit for 16GB GPUs
replies(1): >>46004184 #
embedding-shape ◴[] No.46004184[source]
Depends heavily on the architecture too, I think a free-for-all to find the better sizes is still kind of ongoing, and rightly so. GPT-OSS-120B for example fits in around 61GB VRAM for me when on MXFP4.

Personally, I hope GPU makers instead start adding more VRAM, or if one can dream, expandable VRAM.

replies(1): >>46004680 #
refulgentis ◴[] No.46004680[source]
Unlikely to see more VRAM in the short term, memory prices are thru the roof :/ like, not subtly, 2-4x.
replies(1): >>46005014 #
embedding-shape ◴[] No.46005014[source]
Well, GPUs are getting more VRAM, although it's pricey. But we didn't used to have 96GB VRAM GPUs at all, now they do exist :) But for the ones who can afford it, it is at least possible today. Slowly it increases.
replies(2): >>46005248 #>>46005320 #
1. ◴[] No.46005248[source]