Smollm3: Smol, multilingual, long-context reasoner LLM

(huggingface.co)

350 points kashifr | 2 comments | 08 Jul 25 16:13 UTC | HN request time: 0.382s | source

Show context

WhitneyLand ◴[08 Jul 25 17:27 UTC] No.44502146[source]▶

Mostly SOTA performance at the 3B level. A notable addition to the small but truly open club of models that provide full disclosure, code, recipes to reproduce their work.

Looks like ballpark a million dollars of GPU time if you want to train up one for yourself (4000 gpus/24 days).

Very nice write up that’s generous in sharing their learnings.

This is a solid and positive contribution.

replies(2): >>44502692 #>>44504060 #

1. refulgentis ◴[08 Jul 25 21:03 UTC] No.44504060[source]▶

>>44502146 #

I spent about 10 minutes this AM cross-checking with Phi-4-mini benchmarks, as it was very odd to not include the leader in benchmarks and it seemed universally behind.

For context, I dev an LLM client, a core tenant is keeping local as close to cloud parity as much as is possible. (via llama.cpp)

Companies aren't taking local AI seriously on a sustained basis outside Microsoft.

Overall, I usually would bite my tongue. HF is a great citizen, and I doubt this'll be a one off. However, when I see superlatives affirmed, while leaving out the local SoTA for many many moons that is a godsend in this sector, I think it is good to, rather than shy away, stand up and say this.

replies(1): >>44504307 #

2. adrianlzt ◴[08 Jul 25 21:42 UTC] No.44504307[source]▶

>>44504060 (TP) #

From the blog post: "SmolLM3 supports tool calling, and its chat template incorporates two distinct sections for tool descriptions: XML Tools and Python Tools"

↑