←back to thread

361 points mseri | 6 comments | | HN request time: 0.808s | source | bottom
Show context
tcsenpai ◴[] No.46002818[source]
I think they should start aiming for 20B models along with 32B and 7B. Usually 7B is enough for a 8GB GPU, 32B requires a 24GB GPU for decent quants (I can fit a 32B with IQ3_XXS but is not ideal) while 20-ish B models (such as magistral or gpt-oss) are a perfect fit for 16GB GPUs
replies(1): >>46004184 #
embedding-shape ◴[] No.46004184[source]
Depends heavily on the architecture too, I think a free-for-all to find the better sizes is still kind of ongoing, and rightly so. GPT-OSS-120B for example fits in around 61GB VRAM for me when on MXFP4.

Personally, I hope GPU makers instead start adding more VRAM, or if one can dream, expandable VRAM.

replies(1): >>46004680 #
refulgentis ◴[] No.46004680[source]
Unlikely to see more VRAM in the short term, memory prices are thru the roof :/ like, not subtly, 2-4x.
replies(1): >>46005014 #
1. embedding-shape ◴[] No.46005014[source]
Well, GPUs are getting more VRAM, although it's pricey. But we didn't used to have 96GB VRAM GPUs at all, now they do exist :) But for the ones who can afford it, it is at least possible today. Slowly it increases.
replies(2): >>46005248 #>>46005320 #
2. ◴[] No.46005248[source]
3. refulgentis ◴[] No.46005320[source]
Agreed, in the limit, RAM go up. As billg knows, 128KB definitely wasn't enough for everyone :)
replies(1): >>46005500 #
4. embedding-shape ◴[] No.46005500[source]
I'm already thinking 96GB might not be enough, and I've only had this GPU for 6 months or so :|
replies(1): >>46007184 #
5. refulgentis ◴[] No.46007184{3}[source]
Hehe me too…went all out on a MBP in 2022, did it again in April. Only upgrade I didn’t bother with was topping out at 128 GB of RAM instead of 64. Then, GPT-OSS 120B comes out and quickly makes me very sad I can’t use it locally
replies(1): >>46010341 #
6. anon373839 ◴[] No.46010341{4}[source]
Same. I repeatedly kick myself for not getting the 128GB version, although not for the GPT-OSS model because I really haven’t been too impressed with it (through cloud providers). But now it’s best to wait until the M5 Max is out due to the new GPU neural accelerators that should greatly speed up prompt processing.