Mostly SOTA performance at the 3B level. A notable addition to the small but truly open club of models that provide full disclosure, code, recipes to reproduce their work.
Looks like ballpark a million dollars of GPU time if you want to train up one for yourself (4000 gpus/24 days).
Very nice write up that’s generous in sharing their learnings.
This is a solid and positive contribution.
replies(2):