NanoChat – The best ChatGPT that $100 can buy

1. mhitza ◴[13 Oct 25 17:43 UTC] No.45571218[source]▶

Should be "that you can train for $100"

Curios to try it someday on a set of specialized documents. Though as I understand the cost of running this is whatever GPU you can rent with 80GB of VRAM. Which kind of leaves hobbyists and students out. Unless some cloud is donating gpu compute capacity.

replies(2): >>45571268 #>>45571369 #

2. portaouflop ◴[13 Oct 25 17:49 UTC] No.45571268[source]▶

>>45571218 (TP) #

If I have let’s say 40gb RAM does it not work at all or just take twice as long to train?

replies(1): >>45571442 #

3. Onavo ◴[13 Oct 25 17:56 UTC] No.45571369[source]▶

>>45571218 (TP) #

A GPU with 80GB VRAM costs around $1-3 USD an hour on commodity clouds (i.e. the non-Big 3 bare metal providers e.g. https://getdeploying.com/reference/cloud-gpu/nvidia-h100). I think it's accessible to most middle class users in first world countries.

replies(1): >>45571954 #

4. typpilol ◴[13 Oct 25 18:02 UTC] No.45571442[source]▶

>>45571268 #

Won't work at all. Or if it does it'll be so slow since it'll have to go to the disk for every single calculation so it won't ever finish.

replies(1): >>45572601 #

5. antinomicus ◴[13 Oct 25 18:49 UTC] No.45571954[source]▶

>>45571369 #

Isn’t the whole point to run your model locally?

replies(4): >>45572029 #>>45572031 #>>45572477 #>>45572856 #

6. theptip ◴[13 Oct 25 18:56 UTC] No.45572029{3}[source]▶

>>45571954 #

No, that’s clearly not a goal of this project.

This is a learning tool. If you want a local model you are almost certainly better using something trained on far more compute. (Deepseek, Qwen, etc)

7. yorwba ◴[13 Oct 25 18:56 UTC] No.45572031{3}[source]▶

>>45571954 #

The 80 GB are for training with a batch size of 32 times 2048 tokens each. Since the model has only about 560M parameters, you could probably run it on CPU, if a bit slow.

8. jsight ◴[13 Oct 25 19:39 UTC] No.45572477{3}[source]▶

>>45571954 #

I'd guess that this will output faster than the average reader can read, even while using only CPU inferencing on a modern-ish CPU.

The param count is small enough that even cheap (<$500) GPUs would work too.

9. karpathy ◴[13 Oct 25 19:51 UTC] No.45572601{3}[source]▶

>>45571442 #

It will work great with 40GB GPU, probably a bit less than twice slower. These are micro models of a few B param at most and fit easily during both training and inference.

10. simonw ◴[13 Oct 25 20:14 UTC] No.45572856{3}[source]▶

>>45571954 #

You can run a model locally on much less expensive hardware. It's training that requires the really big GPUs.