←back to thread

350 points kashifr | 2 comments | | HN request time: 0.382s | source
Show context
WhitneyLand ◴[] No.44502146[source]
Mostly SOTA performance at the 3B level. A notable addition to the small but truly open club of models that provide full disclosure, code, recipes to reproduce their work.

Looks like ballpark a million dollars of GPU time if you want to train up one for yourself (4000 gpus/24 days).

Very nice write up that’s generous in sharing their learnings.

This is a solid and positive contribution.

replies(2): >>44502692 #>>44504060 #
1. refulgentis ◴[] No.44504060[source]
I spent about 10 minutes this AM cross-checking with Phi-4-mini benchmarks, as it was very odd to not include the leader in benchmarks and it seemed universally behind.

For context, I dev an LLM client, a core tenant is keeping local as close to cloud parity as much as is possible. (via llama.cpp)

Companies aren't taking local AI seriously on a sustained basis outside Microsoft.

Overall, I usually would bite my tongue. HF is a great citizen, and I doubt this'll be a one off. However, when I see superlatives affirmed, while leaving out the local SoTA for many many moons that is a godsend in this sector, I think it is good to, rather than shy away, stand up and say this.

replies(1): >>44504307 #
2. adrianlzt ◴[] No.44504307[source]
From the blog post: "SmolLM3 supports tool calling, and its chat template incorporates two distinct sections for tool descriptions: XML Tools and Python Tools"