I recently made a little tool for people interested in running local LLMs to figure out if their hardware is able to run an LLM in GPU memory.
replies(10):
Generally speaking models seem to be bucketed by param count (3b, 7b, 8b, 14b, 34b, 70b) so for a given VRAM bucket you will end up being able to run 1000's of models - so is it valuable to show 1000s of models?
My bet is "No" - and what really is valuable is like the top 50 trending models on HuggingFace that would fit in your VRAM bucket. So I will try build that.
Would love your thoughts on that though - does that sound like a good idea?