From multi-head to latent attention: The evolution of attention mechanisms

(vinithavn.medium.com)

169 points mgninad | 2 comments | 30 Aug 25 05:45 UTC | HN request time: 0s | source

Show context

attogram ◴[30 Aug 25 07:32 UTC] No.45072664[source]▶

"Attention Is All You Need" - I've always wondered if the authors of that paper used such a casual and catchy title because they knew it would be groundbreaking and massively cited in the future....

replies(9): >>45073018 #>>45073470 #>>45073494 #>>45073527 #>>45073545 #>>45074544 #>>45074862 #>>45075147 #>>45079506 #

sivm ◴[30 Aug 25 10:25 UTC] No.45073494[source]▶

>>45072664 #

Attention is all you need for what we have. But attention is a local heuristic. We have brittle coherence and no global state. I believe we need a paradigm shift in architecture to move forward.

replies(5): >>45073726 #>>45074245 #>>45074860 #>>45076552 #>>45078243 #

ACCount37 ◴[30 Aug 25 12:58 UTC] No.45074245[source]▶

>>45073494 #

Plenty of "we need a paradigm shift in architecture" going around - and no actual architecture that would beat transformers at their strengths as far as eye can see.

I remain highly skeptical. I doubt that transformers are the best architecture possible, but they set a high bar. And it sure seems like people who keep making the suggestion that "transformers aren't the future" aren't good enough to actually clear that bar.

replies(2): >>45074490 #>>45076257 #

airstrike ◴[30 Aug 25 13:27 UTC] No.45074490{3}[source]▶

>>45074245 #

That logic does not hold.

Being able to provide an immediate replacement is not a requirement to point out limitations in current technology.

replies(1): >>45075281 #

ACCount37 ◴[30 Aug 25 15:05 UTC] No.45075281{4}[source]▶

>>45074490 #

What's the value of "pointing out limitations" if this completely fails to drive any improvements?

If any midwit can say "X is deeply flawed" but no one can put together an Y that would beat X, then clearly, pointing out the flaws was never the bottleneck at all.

replies(2): >>45076649 #>>45091326 #

1. airstrike ◴[30 Aug 25 17:53 UTC] No.45076649{5}[source]▶

>>45075281 #

I think you don't understand how primary research works. Pointing out flaws helps others think about those flaws.

It's not a linear process so I'm not sure the "bottleneck" analogy holds here.

We're not limited to only talking about "the bottleneck". I think the argument is more that we're very close to optimal results for the current approach/architecture, so getting superior outcomes from AI will actually require meaningfully different approaches.

replies(1): >>45077664 #

2. ACCount37 ◴[30 Aug 25 20:16 UTC] No.45077664[source]▶

>>45076649 (TP) #

Where's that "primary research" you're talking about? I certainly don't see it happening here right now.

My point is: saying "transformers are flawed" is dirt cheap. Coming up with anything less flawed isn't.

↑