I'm a bit bearish on SSMs (and hybrid SSM/transformers) because the leading open weight models (DeepSeek, Qwen, Gemma, Llama) are all transformers. There's just no way none of them tried SSMs.
replies(5):
I think other architectures, aside from the transformer, might lead to SOTA performance, but they remain a bit unexplored.