(goombalab.github.io)

63 points jxmorris12 | 1 comments | 08 Jul 25 19:12 UTC | HN request time: 0.236s | source

Show context

Herring ◴[08 Jul 25 21:04 UTC] No.44504065[source]▶

I'm a bit bearish on SSMs (and hybrid SSM/transformers) because the leading open weight models (DeepSeek, Qwen, Gemma, Llama) are all transformers. There's just no way none of them tried SSMs.

replies(5): >>44504164 #>>44504299 #>>44504738 #>>44505203 #>>44506694 #

1. mbowcut2 ◴[09 Jul 25 00:19 UTC] No.44505203[source]▶

>>44504065 #

I think I agree with you. My only rebuttal would be it's this kind of thinking that's kept any leading players form trying other architectures in the first place. As far as I know, SOTA for SSM's just doesn't suggest significant enough potential upsides warrant significant R&D. Not compared to the tried and true established LLM methods. The decision might be something like: "Pay X to train a competitive LLM" vs "Pay 2X to MAYBE train a competitive SSM".

↑

The Tradeoffs of SSMs and Transformers