Ask HN: What are you working on? (February 2025)

1. AJRF ◴[24 Feb 25 07:46 UTC] No.43156818[source]▶

I recently made a little tool for people interested in running local LLMs to figure out if their hardware is able to run an LLM in GPU memory.

https://canirunthisllm.com/

replies(10): >>43156837 #>>43156946 #>>43157271 #>>43157577 #>>43157623 #>>43157743 #>>43158600 #>>43159526 #>>43160623 #>>43163802 #

2. SomeoneOnTheWeb ◴[24 Feb 25 07:50 UTC] No.43156837[source]▶

>>43156818 (TP) #

Very nice! Way more complete than the other tools I've seen to estimate running LLMs on GPUs :)

3. seafoamteal ◴[24 Feb 25 08:11 UTC] No.43156946[source]▶

>>43156818 (TP) #

I've recently been looking into running local LLMs for fun on my laptop (without any GPU) and this is the one thing I've never been able to find consistent information on. This is so helpful, thank you so much! Going to try and run Llama 3.2 3B FP8 soon.

4. jatins ◴[24 Feb 25 08:57 UTC] No.43157271[source]▶

>>43156818 (TP) #

can you make it detect the device somehow, maybe with some additional permissions, instead of user selecting from a dropdown?

replies(1): >>43160263 #

5. alecco ◴[24 Feb 25 09:39 UTC] No.43157577[source]▶

>>43156818 (TP) #

Cool. What about giving the models for a given GPU? Also it could compare using vLLM, local_llama.c, etc. Links to docs maybe. Community build articles and rating. Along the lines of https://pcpartpicker.com/

And you can definitely add some ref links for a bit of revenue.

6. jakubmazanec ◴[24 Feb 25 09:45 UTC] No.43157623[source]▶

>>43156818 (TP) #

It doesn't work for all GPU/device in Simple tab: "Exception: Failed to calculate information for model. Error: Could not extract VRAM from: System Shared".

replies(1): >>43157805 #

7. chandureddyvari ◴[24 Feb 25 10:02 UTC] No.43157743[source]▶

>>43156818 (TP) #

Neat idea! It would be helpful to have LLMs ranked from best to worst for a given GPU. Few other improvements I can think of:

- Use natural language for telling offloading requirements.

- Just year of the LLM launch of HF url can help if it’s an outdated LLM or a cutting edge LLM.

- VLMs/Embedding models are missing?

replies(1): >>43161198 #

8. AJRF ◴[24 Feb 25 10:09 UTC] No.43157805[source]▶

>>43157623 #

Ah sorry, I will fix that.

9. niek_pas ◴[24 Feb 25 12:07 UTC] No.43158600[source]▶

>>43156818 (TP) #

Feature request: I would like to know if I can run _any_ LLms on my machine, and if so, which.

replies(1): >>43160840 #

10. donohoe ◴[24 Feb 25 13:49 UTC] No.43159526[source]▶

>>43156818 (TP) #

Feature Request: Looks like the React JS is 1.1MB out of the ~ 1.6MB the site takes to load all other assets. Do you really need React for this?

That aside, I think this is really cool and very helpful. Thank you.

replies(1): >>43161132 #

11. jay-barronville ◴[24 Feb 25 14:55 UTC] No.43160263[source]▶

>>43157271 #

> can you make it detect the device somehow, maybe with some additional permissions, instead of user selecting from a dropdown?

Detecting CPU and GPU specs browser-side is almost impossible to do reliably (even if relying on advanced fingerprinting and certain heuristics).

For GPU’s, it may be possible to use (1) WebGL’s `WEBGL_debug_renderer_info` extension [0][0] or (2) WebGPU’s `GPUAdapter#info` [1][1], but I wouldn’t trust either of those API’s for general usage.

[0]: https://developer.mozilla.org/en-US/docs/Web/API/WEBGL_debug...

[1]: https://developer.mozilla.org/en-US/docs/Web/API/GPUAdapter/...

replies(1): >>43160938 #

12. kristopolous ◴[24 Feb 25 15:28 UTC] No.43160623[source]▶

>>43156818 (TP) #

This looks closed source, am I correct?

replies(1): >>43160855 #

13. AJRF ◴[24 Feb 25 15:45 UTC] No.43160840[source]▶

>>43158600 #

I've now had multiple people ask for this - I will work on adding a new tab for this feature as it is a little different than what the site was originally intended to do.

Generally speaking models seem to be bucketed by param count (3b, 7b, 8b, 14b, 34b, 70b) so for a given VRAM bucket you will end up being able to run 1000's of models - so is it valuable to show 1000s of models?

My bet is "No" - and what really is valuable is like the top 50 trending models on HuggingFace that would fit in your VRAM bucket. So I will try build that.

Would love your thoughts on that though - does that sound like a good idea?

replies(1): >>43289321 #

14. AJRF ◴[24 Feb 25 15:46 UTC] No.43160855[source]▶

>>43160623 #

Not so much purposefully closed source more that I don't want to make it complex by splitting out the data the app uses from the code (co-ordination problem when it comes to deploying that I don't want to deal with for a project of this size).

When it comes to "how to do the math" this repo was my starting point: https://github.com/Raskoll2/LLMcalc

15. AJRF ◴[24 Feb 25 15:52 UTC] No.43160938{3}[source]▶

>>43160263 #

Jay you seem knowledgeable on this - thanks for answering - I have a question

I did look at auto-detecting before, but it seems like you can only really tell the features of a GPU, not so much the good info (VRAM amount and bus speed) - is that the case?

I looked at the GPUAdapter docs, and all it told me was:

- device maker (amd)

- architecture (rdna-3)

and that was it. Is there a way to poke for bus speed and vram amount?

replies(1): >>43164632 #

16. AJRF ◴[24 Feb 25 16:07 UTC] No.43161132[source]▶

>>43159526 #

I am using Streamlit, and it is the thing that is adding React.

I appreciate that it's a heavy site, but just being honest with you - it doesn't seem worth the time optimising this by moving to another lighter framework at this stage of the project.

Sorry!

replies(1): >>43163129 #

17. AJRF ◴[24 Feb 25 16:12 UTC] No.43161198[source]▶

>>43157743 #

Hey - thanks for the reply.

  - Use natural language for telling offloading requirements.

Do you mean remove the JSON thing and just summarise the offloading requirements?

  - Just year of the LLM launch of HF url can help if it’s an outdated LLM or a cutting edge LLM.

Great Idea - I will try add this tonight.

  - VLMs/Embedding models are missing?

Yeah I just have text generation models ATM as that is by far where the most interest is. I will look at adding other model types in another type, but wouldn't be until the weekend that I do that.

18. donohoe ◴[24 Feb 25 18:38 UTC] No.43163129{3}[source]▶

>>43161132 #

No, its all good. And you are right. Its gets the job done.

19. dockerd ◴[24 Feb 25 19:27 UTC] No.43163802[source]▶

>>43156818 (TP) #

Looks good.

Feature request - Have a leaderboard of LLM for x/y/z tasks or pull it from existing repo. Suggest the best model for given GPU for x/y/z task.

If there is better model which my GPU can run, why should I go for the lowest?

replies(1): >>43163820 #

20. dockerd ◴[24 Feb 25 19:29 UTC] No.43163820[source]▶

>>43163802 #

And maybe provide ollama/lm studio run command for given model/quantization

21. jay-barronville ◴[24 Feb 25 20:34 UTC] No.43164632{4}[source]▶

>>43160938 #

> Is there a way to poke for bus speed and vram amount?

Unfortunately, I’m not aware of any way to reliably get this type of information browser-side.

If you really want to see how far you can get, your best bet would be via fingerprinting, which would require some upfront work using a combination of the manual input you already have now and running some stuff in the background to collect data (especially timing-related data). With enough users manually inputting their specs and enough independently collected data, you’d probably be surprised at how accurate you can get via fingerprinting.

That said, please do NOT go the fingerprinting route, because (1) a lot of users (including myself) hate being fingerprinted (especially if done covertly), (2) you need quite a lot of data before you can do anything useful, and (3) it’s obviously not worth the effort for what you’re building.

22. niek_pas ◴[07 Mar 25 11:30 UTC] No.43289321{3}[source]▶

>>43160840 #

I see your point. I think the solution you mention (top 50 trending models) is as good a solution as I could come up with. Maybe the flow should be: Select a GPU / device -> list all the runnable models, sorted by popularity descending. How you want to operationalize popularity is another question...