And you can definitely add some ref links for a bit of revenue.
- Use natural language for telling offloading requirements.
- Just year of the LLM launch of HF url can help if it’s an outdated LLM or a cutting edge LLM.
- VLMs/Embedding models are missing?
Detecting CPU and GPU specs browser-side is almost impossible to do reliably (even if relying on advanced fingerprinting and certain heuristics).
For GPU’s, it may be possible to use (1) WebGL’s `WEBGL_debug_renderer_info` extension [0][0] or (2) WebGPU’s `GPUAdapter#info` [1][1], but I wouldn’t trust either of those API’s for general usage.
[0]: https://developer.mozilla.org/en-US/docs/Web/API/WEBGL_debug...
[1]: https://developer.mozilla.org/en-US/docs/Web/API/GPUAdapter/...
Generally speaking models seem to be bucketed by param count (3b, 7b, 8b, 14b, 34b, 70b) so for a given VRAM bucket you will end up being able to run 1000's of models - so is it valuable to show 1000s of models?
My bet is "No" - and what really is valuable is like the top 50 trending models on HuggingFace that would fit in your VRAM bucket. So I will try build that.
Would love your thoughts on that though - does that sound like a good idea?
When it comes to "how to do the math" this repo was my starting point: https://github.com/Raskoll2/LLMcalc
I did look at auto-detecting before, but it seems like you can only really tell the features of a GPU, not so much the good info (VRAM amount and bus speed) - is that the case?
I looked at the GPUAdapter docs, and all it told me was:
- device maker (amd)
- architecture (rdna-3)
and that was it. Is there a way to poke for bus speed and vram amount?
I appreciate that it's a heavy site, but just being honest with you - it doesn't seem worth the time optimising this by moving to another lighter framework at this stage of the project.
Sorry!
- Use natural language for telling offloading requirements.
Do you mean remove the JSON thing and just summarise the offloading requirements? - Just year of the LLM launch of HF url can help if it’s an outdated LLM or a cutting edge LLM.
Great Idea - I will try add this tonight. - VLMs/Embedding models are missing?
Yeah I just have text generation models ATM as that is by far where the most interest is. I will look at adding other model types in another type, but wouldn't be until the weekend that I do that.Feature request - Have a leaderboard of LLM for x/y/z tasks or pull it from existing repo. Suggest the best model for given GPU for x/y/z task.
If there is better model which my GPU can run, why should I go for the lowest?
Unfortunately, I’m not aware of any way to reliably get this type of information browser-side.
If you really want to see how far you can get, your best bet would be via fingerprinting, which would require some upfront work using a combination of the manual input you already have now and running some stuff in the background to collect data (especially timing-related data). With enough users manually inputting their specs and enough independently collected data, you’d probably be surprised at how accurate you can get via fingerprinting.
That said, please do NOT go the fingerprinting route, because (1) a lot of users (including myself) hate being fingerprinted (especially if done covertly), (2) you need quite a lot of data before you can do anything useful, and (3) it’s obviously not worth the effort for what you’re building.