←back to thread

347 points kashifr | 1 comments | | HN request time: 0.211s | source
Show context
WhitneyLand ◴[] No.44502146[source]
Mostly SOTA performance at the 3B level. A notable addition to the small but truly open club of models that provide full disclosure, code, recipes to reproduce their work.

Looks like ballpark a million dollars of GPU time if you want to train up one for yourself (4000 gpus/24 days).

Very nice write up that’s generous in sharing their learnings.

This is a solid and positive contribution.

replies(2): >>44502692 #>>44504060 #
YetAnotherNick ◴[] No.44502692[source]
It's 384 H100s for 24 days, costing less than half a million dollars.
replies(2): >>44503252 #>>44505653 #
segmondy ◴[] No.44505653[source]
H100 are going for about $3/hr, 384243 ~ $28k
replies(6): >>44505754 #>>44505979 #>>44506134 #>>44507506 #>>44507964 #>>44509849 #
1. social_quotient ◴[] No.44507964[source]
Runpod is worth a look for these on demand workloads https://www.runpod.io/pricing I use a lot for ffmpeg workloads.

Found this a few days ago which might be neat for finding cheaper https://www.primeintellect.ai/

No affiliation with either