←back to thread

113 points sethkim | 1 comments | | HN request time: 0.355s | source
Show context
x-complexity ◴[] No.44460557[source]
The article assumes that there will be no architectural improvements / migrations in the future, & that Sparse MoE will always stay. Not a great foundation to build upon.

Personally, I'm rooting for RWKV / Mamba2 to pull through, somehow. There's been some work done to increase their reasoning depths, but transformers still beat them without much effort.

https://x.com/ZeyuanAllenZhu/status/1918684269251371164

replies(1): >>44461677 #
1. NetRunnerSu ◴[] No.44461677[source]
In fact, what you need is a dynamic sparse live hyperfragmented Transformer MoE, rather than a product like RNN that is destined to be backward...

In terms of microbiology, the architecture of Transformer is more in line with the highly interconnected global receptive field of neurons

https://github.com/dmf-archive/PILF