←back to thread

345 points kashifr | 9 comments | | HN request time: 0s | source | bottom
Show context
WhitneyLand ◴[] No.44502146[source]
Mostly SOTA performance at the 3B level. A notable addition to the small but truly open club of models that provide full disclosure, code, recipes to reproduce their work.

Looks like ballpark a million dollars of GPU time if you want to train up one for yourself (4000 gpus/24 days).

Very nice write up that’s generous in sharing their learnings.

This is a solid and positive contribution.

replies(2): >>44502692 #>>44504060 #
YetAnotherNick ◴[] No.44502692[source]
It's 384 H100s for 24 days, costing less than half a million dollars.
replies(2): >>44503252 #>>44505653 #
1. segmondy ◴[] No.44505653[source]
H100 are going for about $3/hr, 384243 ~ $28k
replies(6): >>44505754 #>>44505979 #>>44506134 #>>44507506 #>>44507964 #>>44509849 #
2. jazzyjackson ◴[] No.44505754[source]
Take this brother, \*, it may serve you well
3. dr_kretyn ◴[] No.44505979[source]
The price just keeps on dropping with each comment. Anyone going to estimate it for less?

What's the source for $3/h?

replies(1): >>44506274 #
4. jrk ◴[] No.44506134[source]
This is indeed a reasonable cost estimate for competitive short-term H100 rentals (source: much SemiAnalysis coverage, and my own exploration of the market), but there is a critical error (besides the formatting glitch with `*`):

It was 24 days (576 hours) not 24 hours. $663,552 @ $3/hr.

replies(1): >>44509470 #
5. pests ◴[] No.44506274[source]
They miscalculated only 24 hours, not 24 days, so their number is off by a factor of 24.
6. YetAnotherNick ◴[] No.44507506[source]
You can buy for $2.2/GPU/hr for on-demand and likely around $2 for this big order.

[1]: https://datacrunch.io/products#H100

7. social_quotient ◴[] No.44507964[source]
Runpod is worth a look for these on demand workloads https://www.runpod.io/pricing I use a lot for ffmpeg workloads.

Found this a few days ago which might be neat for finding cheaper https://www.primeintellect.ai/

No affiliation with either

8. mromanuk ◴[] No.44509470[source]
According to Runpod pricing page, you can run H100 for $2.39, it can go as lower as $528,629.76

WARNING: This is highly speculative and napkin math

H200 (141 GB HBM3 - $3.99/h - 1.4x perf) 216 x 24 x 17 = 88128h = 351.895,104 (17 days and 216 cards)

B200 (192 GB HBM3e - $5.99/h - 2.8x perf) 158 x 24 x 9 = 34128h = $204.426,72

Probably wrong math, should be more efficient and cheaper. Doubt that they have 100/200 cards available for that long.

Source: I've only trained using RTX4090 and stuff like that with 8 cards.

Not affiliated in any way with Runpod.

9. lhl ◴[] No.44509849[source]
You can go much lower: https://gpulist.ai/