←back to thread

63 points jxmorris12 | 1 comments | | HN request time: 0.236s | source
Show context
Herring ◴[] No.44504065[source]
I'm a bit bearish on SSMs (and hybrid SSM/transformers) because the leading open weight models (DeepSeek, Qwen, Gemma, Llama) are all transformers. There's just no way none of them tried SSMs.
replies(5): >>44504164 #>>44504299 #>>44504738 #>>44505203 #>>44506694 #
1. mbowcut2 ◴[] No.44505203[source]
I think I agree with you. My only rebuttal would be it's this kind of thinking that's kept any leading players form trying other architectures in the first place. As far as I know, SOTA for SSM's just doesn't suggest significant enough potential upsides warrant significant R&D. Not compared to the tried and true established LLM methods. The decision might be something like: "Pay X to train a competitive LLM" vs "Pay 2X to MAYBE train a competitive SSM".