Wish I knew better how to estimate what sized video card one needs. HuggingFace link says this is bfloat16, so at least 64GB?
I guess the -7B might run on my 16GB AMD card?
replies(4):
I guess the -7B might run on my 16GB AMD card?
Also it's entirely possible to run a model that doesn't fit in available GPU memory, it will just be slower.
That will help you quickly calculate the model VRAM usage as well as the VRAM usage of the context length you want to use. You can put "Qwen/Qwen2.5-VL-32B-Instruct" in the "Model (unquantized)" field. Funnily enough the calculator lacks the option to see without quantizing the model, usually because nobody worried about VRAM bothers running >8 bit quants.
I expect more and more worthwhile models to natively have <16 bit weights as time goes on but for the moment it's pretty much "8 bit DeepSeek and some research/testing models of various parameter width".