(goombalab.github.io)

65 points jxmorris12 | 1 comments | 08 Jul 25 19:12 UTC | HN request time: 0.208s | source

Show context

Herring ◴[08 Jul 25 21:04 UTC] No.44504065[source]▶

I'm a bit bearish on SSMs (and hybrid SSM/transformers) because the leading open weight models (DeepSeek, Qwen, Gemma, Llama) are all transformers. There's just no way none of them tried SSMs.

replies(5): >>44504164 #>>44504299 #>>44504738 #>>44505203 #>>44506694 #

1. nextos ◴[08 Jul 25 21:41 UTC] No.44504299[source]▶

>>44504065 #

Second-generation LSTMs (xLSTM) do have leading performance on zero-shot time series forecasting: https://arxiv.org/abs/2505.23719.

I think other architectures, aside from the transformer, might lead to SOTA performance, but they remain a bit unexplored.

↑

The Tradeoffs of SSMs and Transformers