You will effectively want a 48GB card or more for quantized versions, otherwise you won't have meaningful space left for the KV cache. Blackwell and above is generally a good idea to get faster hardware support for 4b (some recent models took some time to ship for older architectures, gpt-oss IIRC).