(github.com)

210 points blackcat201 | 2 comments | 31 Oct 25 00:07 UTC | HN request time: 0s | source

Show context

amoskvin ◴[31 Oct 25 08:51 UTC] No.45769758[source]▶

any hardware recommendations? how much memory do we need to this?

1. uniqueuid ◴[31 Oct 25 09:39 UTC] No.45770061[source]▶

You will effectively want a 48GB card or more for quantized versions, otherwise you won't have meaningful space left for the KV cache. Blackwell and above is generally a good idea to get faster hardware support for 4b (some recent models took some time to ship for older architectures, gpt-oss IIRC).

replies(1): >>45771736 #

2. samus ◴[31 Oct 25 13:24 UTC] No.45771736[source]▶

>>45770061 (TP) #

This is a Mixture of Experts model with only 3B activated parameters. But I agree that for the intended usage scenario VRAM for the KV cache is the real limitation.

↑

Kimi Linear: An Expressive, Efficient Attention Architecture