(huggingface.co)

347 points kashifr | 1 comments | 08 Jul 25 16:13 UTC | HN request time: 0.417s | source

Show context

WhitneyLand ◴[08 Jul 25 17:27 UTC] No.44502146[source]▶

Mostly SOTA performance at the 3B level. A notable addition to the small but truly open club of models that provide full disclosure, code, recipes to reproduce their work.

Looks like ballpark a million dollars of GPU time if you want to train up one for yourself (4000 gpus/24 days).

Very nice write up that’s generous in sharing their learnings.

This is a solid and positive contribution.

replies(2): >>44502692 #>>44504060 #

YetAnotherNick ◴[08 Jul 25 18:31 UTC] No.44502692[source]▶

>>44502146 #

It's 384 H100s for 24 days, costing less than half a million dollars.

replies(2): >>44503252 #>>44505653 #

segmondy ◴[09 Jul 25 01:53 UTC] No.44505653[source]▶

>>44502692 #

H100 are going for about $3/hr, 384243 ~ $28k

replies(6): >>44505754 #>>44505979 #>>44506134 #>>44507506 #>>44507964 #>>44509849 #

dr_kretyn ◴[09 Jul 25 02:58 UTC] No.44505979[source]▶

>>44505653 #

The price just keeps on dropping with each comment. Anyone going to estimate it for less?

What's the source for $3/h?

replies(1): >>44506274 #

1. pests ◴[09 Jul 25 04:10 UTC] No.44506274[source]▶

>>44505979 #

They miscalculated only 24 hours, not 24 days, so their number is off by a factor of 24.

↑

Smollm3: Smol, multilingual, long-context reasoner LLM