←back to thread

262 points rain1 | 1 comments | | HN request time: 0.209s | source
1. fossa1 ◴[] No.44443183[source]
It’s ironic: for years the open-source community was trying to match GPT-3 (175B dense) with 30B–70B models + RLHF + synthetic data—and the performance gap persisted.

Turns out, size really did matter, at least at the base model level. Only with the release of truly massive dense (405B) or high-activation MoE models (DeepSeek V3, DBRX, etc) did we start seeing GPT-4-level reasoning emerge outside closed labs.