Smollm3: Smol, multilingual, long-context reasoner LLM

(huggingface.co)

347 points kashifr | 2 comments | 08 Jul 25 16:13 UTC | HN request time: 0.001s | source

Show context

WhitneyLand ◴[08 Jul 25 17:27 UTC] No.44502146[source]▶

Mostly SOTA performance at the 3B level. A notable addition to the small but truly open club of models that provide full disclosure, code, recipes to reproduce their work.

Looks like ballpark a million dollars of GPU time if you want to train up one for yourself (4000 gpus/24 days).

Very nice write up that’s generous in sharing their learnings.

This is a solid and positive contribution.

replies(2): >>44502692 #>>44504060 #

YetAnotherNick ◴[08 Jul 25 18:31 UTC] No.44502692[source]▶

>>44502146 #

It's 384 H100s for 24 days, costing less than half a million dollars.

replies(2): >>44503252 #>>44505653 #

Imustaskforhelp ◴[08 Jul 25 19:33 UTC] No.44503252[source]▶

>>44502692 #

Pardon me, but is the dataset public.

Like if I really really just wanted to build it from scratch, could I do so? (not that I have that money but just curious)

replies(1): >>44503264 #

hynky ◴[08 Jul 25 19:35 UTC] No.44503264[source]▶

>>44503252 #

yes, both core web datasets are publicly available as well as the rest

replies(1): >>44503289 #

Imustaskforhelp ◴[08 Jul 25 19:37 UTC] No.44503289[source]▶

>>44503264 #

Thanks!

To be honest, if I might argue then that this is one of the best truly open source models that we have got.

There is AllenAI and (Elmo?) and there is also this one which does distributed training but I think this looks a lot like SOTA for 3B parameters to me.

Thanks for telling me, I am not going to lie, I am going to try to test it now! (Ima try some GGUF since ollama convenience)

replies(1): >>44504387 #

1. peatmoss ◴[08 Jul 25 21:52 UTC] No.44504387[source]▶

>>44503289 #

OLMo: https://allenai.org/olmo

AFAIK, they were the first open everything model.

replies(1): >>44507803 #

2. diggan ◴[09 Jul 25 09:08 UTC] No.44507803[source]▶

>>44504387 (TP) #

> AFAIK, they were the first open everything model.

GPT2 (released ~5 years ago?) was "open" in the sense that weights were available for download (sans license), exact datasets that were used where outlined, the architecture explained and so on, so I guess it was also "open" in the sense that Llama is "open", but neither would be "open source" which I'd feel pretty confident to label OLMo with.

So OLMo seems to be the first actually "open source" model, but maybe not "open" as in "downloadable" (which Facebook tries to call "open source").

↑