←back to thread

135 points lnyan | 2 comments | | HN request time: 0s | source
Show context
llm_trw ◴[] No.42731112[source]
>The estimated training time for the end-to-end model on an 8×H100 machine is 2.6 days.

That's a $250,000 machine for the micro budget. Or if you don't want to do it locally ~$2,000 to do it on someone else's machine for the one model.

replies(4): >>42731300 #>>42731535 #>>42732654 #>>42738646 #
1. GaggiX ◴[] No.42731300[source]
You can do it on one single GPU but you would need to use gradient accumulation and the training would probably last 1-2 months on a consumer GPU.
replies(1): >>42731829 #
2. programd ◴[] No.42731829[source]
Accepting the 1-2 month estimate at face value we're firmly in hobby territory now. Any adult with a job and a GPU can train their own models for an investement roughly equivelent to a high end gaming machine. Let's run some very hand wavy numbers:

RTX 4090 ($3000) + CPU/Motherboard/SSD/etc ($1600) + two months at full power ($300) is only on the order of $5000 initial investment for the first model. After that you can train 6 models per year to your exact specifications for an extra $150 per month in power usage. This cost will go down.

I'm expecting an explosion of micro-AI models specifically tailored for very narrow use cases. I mean Hugging face already has thousands of models, but they're mostly reusing the aligned big corporate stuff. What's coming is an avalanche of infinately creative micro-AI models, both good and bad. There are no moats.

It's going to be kind of like when teenagers got their hands on personal computers. Oh wait....