←back to thread

Zamba2-7B

(www.zyphra.com)
282 points dataminer | 1 comments | | HN request time: 0.308s | source
Show context
arnaudsm ◴[] No.41843888[source]
I'm tired of LLM releases that cherry pick benchmarks. How does it compare to SOTA qwen2.5/phi3.5 ?

Anyone knows an up to date independent leaderboard? Lmsys and livebench used to be great but skipped most major models recently.

replies(2): >>41844092 #>>41846615 #
1. reissbaker ◴[] No.41846615[source]
Phi 3.5 is pretty bad in practice, the Phi series always benchmarks well on the popular benchmarks and then falls over IRL (or on less-popular benchmarks). It would be nice to see it against Qwen2.5, but the Qwen team didn't release any evals on the 7B version AFAIK, so I can see why the Zamba folks compared it against other published benchmarks of similar-sized models.

In general the idea with these hybrid SSM architectures is to show that you can get good results with fewer training tokens, and to significantly improve inference speed. Even if Qwen2.5 was better at MMLU, etc, it definitely used way more training tokens to get there (18T tokens for Qwen2.5 vs 3T for Zamba2), so Zamba2 is still a pretty useful result.

TBD if Zamba2 is actually good in real world usage (Phi3.5 for example used only 3.4T tokens and got good public benchmark results, it's just not very good at anything other than the public benchmarks), but Jamba1.5 -- another hybrid SSM architecture -- did seem to do quite well on the LMSys leaderboards (which are admittedly these days not a super effective measure, but still feel less gameable than MMLU), so I'm moderately hopeful that this is a real architectural win and not just gamed benchmarks.