Smollm3: Smol, multilingual, long-context reasoner LLM

1. WhitneyLand ◴[08 Jul 25 17:27 UTC] No.44502146[source]▶

Mostly SOTA performance at the 3B level. A notable addition to the small but truly open club of models that provide full disclosure, code, recipes to reproduce their work.

Looks like ballpark a million dollars of GPU time if you want to train up one for yourself (4000 gpus/24 days).

Very nice write up that’s generous in sharing their learnings.

This is a solid and positive contribution.

replies(2): >>44502692 #>>44504060 #

2. YetAnotherNick ◴[08 Jul 25 18:31 UTC] No.44502692[source]▶

>>44502146 (TP) #

It's 384 H100s for 24 days, costing less than half a million dollars.

replies(2): >>44503252 #>>44505653 #

3. Imustaskforhelp ◴[08 Jul 25 19:33 UTC] No.44503252[source]▶

>>44502692 #

Pardon me, but is the dataset public.

Like if I really really just wanted to build it from scratch, could I do so? (not that I have that money but just curious)

replies(1): >>44503264 #

4. hynky ◴[08 Jul 25 19:35 UTC] No.44503264{3}[source]▶

>>44503252 #

yes, both core web datasets are publicly available as well as the rest

replies(1): >>44503289 #

5. Imustaskforhelp ◴[08 Jul 25 19:37 UTC] No.44503289{4}[source]▶

>>44503264 #

Thanks!

To be honest, if I might argue then that this is one of the best truly open source models that we have got.

There is AllenAI and (Elmo?) and there is also this one which does distributed training but I think this looks a lot like SOTA for 3B parameters to me.

Thanks for telling me, I am not going to lie, I am going to try to test it now! (Ima try some GGUF since ollama convenience)

replies(1): >>44504387 #

6. refulgentis ◴[08 Jul 25 21:03 UTC] No.44504060[source]▶

>>44502146 (TP) #

I spent about 10 minutes this AM cross-checking with Phi-4-mini benchmarks, as it was very odd to not include the leader in benchmarks and it seemed universally behind.

For context, I dev an LLM client, a core tenant is keeping local as close to cloud parity as much as is possible. (via llama.cpp)

Companies aren't taking local AI seriously on a sustained basis outside Microsoft.

Overall, I usually would bite my tongue. HF is a great citizen, and I doubt this'll be a one off. However, when I see superlatives affirmed, while leaving out the local SoTA for many many moons that is a godsend in this sector, I think it is good to, rather than shy away, stand up and say this.

replies(1): >>44504307 #

7. adrianlzt ◴[08 Jul 25 21:42 UTC] No.44504307[source]▶

>>44504060 #

From the blog post: "SmolLM3 supports tool calling, and its chat template incorporates two distinct sections for tool descriptions: XML Tools and Python Tools"

8. peatmoss ◴[08 Jul 25 21:52 UTC] No.44504387{5}[source]▶

>>44503289 #

OLMo: https://allenai.org/olmo

AFAIK, they were the first open everything model.

replies(1): >>44507803 #

9. segmondy ◴[09 Jul 25 01:53 UTC] No.44505653[source]▶

>>44502692 #

H100 are going for about $3/hr, 384243 ~ $28k

replies(6): >>44505754 #>>44505979 #>>44506134 #>>44507506 #>>44507964 #>>44509849 #

10. jazzyjackson ◴[09 Jul 25 02:11 UTC] No.44505754{3}[source]▶

>>44505653 #

Take this brother, \*, it may serve you well

11. dr_kretyn ◴[09 Jul 25 02:58 UTC] No.44505979{3}[source]▶

>>44505653 #

The price just keeps on dropping with each comment. Anyone going to estimate it for less?

What's the source for $3/h?

replies(1): >>44506274 #

12. jrk ◴[09 Jul 25 03:38 UTC] No.44506134{3}[source]▶

>>44505653 #

This is indeed a reasonable cost estimate for competitive short-term H100 rentals (source: much SemiAnalysis coverage, and my own exploration of the market), but there is a critical error (besides the formatting glitch with `*`):

It was 24 days (576 hours) not 24 hours. $663,552 @ $3/hr.

replies(1): >>44509470 #

13. pests ◴[09 Jul 25 04:10 UTC] No.44506274{4}[source]▶

>>44505979 #

They miscalculated only 24 hours, not 24 days, so their number is off by a factor of 24.

14. YetAnotherNick ◴[09 Jul 25 08:23 UTC] No.44507506{3}[source]▶

>>44505653 #

You can buy for $2.2/GPU/hr for on-demand and likely around $2 for this big order.

[1]: https://datacrunch.io/products#H100

15. diggan ◴[09 Jul 25 09:08 UTC] No.44507803{6}[source]▶

>>44504387 #

> AFAIK, they were the first open everything model.

GPT2 (released ~5 years ago?) was "open" in the sense that weights were available for download (sans license), exact datasets that were used where outlined, the architecture explained and so on, so I guess it was also "open" in the sense that Llama is "open", but neither would be "open source" which I'd feel pretty confident to label OLMo with.

So OLMo seems to be the first actually "open source" model, but maybe not "open" as in "downloadable" (which Facebook tries to call "open source").

16. social_quotient ◴[09 Jul 25 09:40 UTC] No.44507964{3}[source]▶

>>44505653 #

Runpod is worth a look for these on demand workloads https://www.runpod.io/pricing I use a lot for ffmpeg workloads.

Found this a few days ago which might be neat for finding cheaper https://www.primeintellect.ai/

No affiliation with either

17. mromanuk ◴[09 Jul 25 12:54 UTC] No.44509470{4}[source]▶

>>44506134 #

According to Runpod pricing page, you can run H100 for $2.39, it can go as lower as $528,629.76

WARNING: This is highly speculative and napkin math

H200 (141 GB HBM3 - $3.99/h - 1.4x perf) 216 x 24 x 17 = 88128h = 351.895,104 (17 days and 216 cards)

B200 (192 GB HBM3e - $5.99/h - 2.8x perf) 158 x 24 x 9 = 34128h = $204.426,72

Probably wrong math, should be more efficient and cheaper. Doubt that they have 100/200 cards available for that long.

Source: I've only trained using RTX4090 and stuff like that with 8 cards.

Not affiliated in any way with Runpod.

18. lhl ◴[09 Jul 25 13:31 UTC] No.44509849{3}[source]▶

>>44505653 #

You can go much lower: https://gpulist.ai/