From multi-head to latent attention: The evolution of attention mechanisms

(vinithavn.medium.com)

169 points mgninad | 4 comments | 30 Aug 25 05:45 UTC | HN request time: 0s | source

Show context

attogram ◴[30 Aug 25 07:32 UTC] No.45072664[source]▶

"Attention Is All You Need" - I've always wondered if the authors of that paper used such a casual and catchy title because they knew it would be groundbreaking and massively cited in the future....

replies(9): >>45073018 #>>45073470 #>>45073494 #>>45073527 #>>45073545 #>>45074544 #>>45074862 #>>45075147 #>>45079506 #

sivm ◴[30 Aug 25 10:25 UTC] No.45073494[source]▶

>>45072664 #

Attention is all you need for what we have. But attention is a local heuristic. We have brittle coherence and no global state. I believe we need a paradigm shift in architecture to move forward.

replies(5): >>45073726 #>>45074245 #>>45074860 #>>45076552 #>>45078243 #

ACCount37 ◴[30 Aug 25 12:58 UTC] No.45074245[source]▶

>>45073494 #

Plenty of "we need a paradigm shift in architecture" going around - and no actual architecture that would beat transformers at their strengths as far as eye can see.

I remain highly skeptical. I doubt that transformers are the best architecture possible, but they set a high bar. And it sure seems like people who keep making the suggestion that "transformers aren't the future" aren't good enough to actually clear that bar.

replies(2): >>45074490 #>>45076257 #

airstrike ◴[30 Aug 25 13:27 UTC] No.45074490[source]▶

>>45074245 #

That logic does not hold.

Being able to provide an immediate replacement is not a requirement to point out limitations in current technology.

replies(1): >>45075281 #

1. ACCount37 ◴[30 Aug 25 15:05 UTC] No.45075281{3}[source]▶

>>45074490 #

What's the value of "pointing out limitations" if this completely fails to drive any improvements?

If any midwit can say "X is deeply flawed" but no one can put together an Y that would beat X, then clearly, pointing out the flaws was never the bottleneck at all.

replies(2): >>45076649 #>>45091326 #

2. airstrike ◴[30 Aug 25 17:53 UTC] No.45076649[source]▶

>>45075281 (TP) #

I think you don't understand how primary research works. Pointing out flaws helps others think about those flaws.

It's not a linear process so I'm not sure the "bottleneck" analogy holds here.

We're not limited to only talking about "the bottleneck". I think the argument is more that we're very close to optimal results for the current approach/architecture, so getting superior outcomes from AI will actually require meaningfully different approaches.

replies(1): >>45077664 #

3. ACCount37 ◴[30 Aug 25 20:16 UTC] No.45077664[source]▶

>>45076649 #

Where's that "primary research" you're talking about? I certainly don't see it happening here right now.

My point is: saying "transformers are flawed" is dirt cheap. Coming up with anything less flawed isn't.

4. jychang ◴[01 Sep 25 10:15 UTC] No.45091326[source]▶

>>45075281 (TP) #

> What's the value of "pointing out limitations" if this completely fails to drive any improvements?

Ironically, the same could be said about Attention Is All You Need in 2017. It didn’t drive any improvements immediately- actual decent Transformer models took a few years to arrive after that.

↑