(sutro.sh)

113 points sethkim | 1 comments | 03 Jul 25 17:34 UTC | HN request time: 0.355s | source

Show context

x-complexity ◴[04 Jul 25 02:06 UTC] No.44460557[source]▶

The article assumes that there will be no architectural improvements / migrations in the future, & that Sparse MoE will always stay. Not a great foundation to build upon.

Personally, I'm rooting for RWKV / Mamba2 to pull through, somehow. There's been some work done to increase their reasoning depths, but transformers still beat them without much effort.

https://x.com/ZeyuanAllenZhu/status/1918684269251371164

replies(1): >>44461677 #

1. NetRunnerSu ◴[04 Jul 25 06:28 UTC] No.44461677[source]▶

>>44460557 #

In fact, what you need is a dynamic sparse live hyperfragmented Transformer MoE, rather than a product like RNN that is destined to be backward...

In terms of microbiology, the architecture of Transformer is more in line with the highly interconnected global receptive field of neurons

https://github.com/dmf-archive/PILF

↑

The End of Moore's Law for AI? Gemini Flash Offers a Warning