←back to thread

65 points jxmorris12 | 1 comments | | HN request time: 0.208s | source
Show context
Herring ◴[] No.44504065[source]
I'm a bit bearish on SSMs (and hybrid SSM/transformers) because the leading open weight models (DeepSeek, Qwen, Gemma, Llama) are all transformers. There's just no way none of them tried SSMs.
replies(5): >>44504164 #>>44504299 #>>44504738 #>>44505203 #>>44506694 #
1. nextos ◴[] No.44504299[source]
Second-generation LSTMs (xLSTM) do have leading performance on zero-shot time series forecasting: https://arxiv.org/abs/2505.23719.

I think other architectures, aside from the transformer, might lead to SOTA performance, but they remain a bit unexplored.