/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
From multi-head to latent attention: The evolution of attention mechanisms
(vinithavn.medium.com)
169 points
mgninad
| 1 comments |
30 Aug 25 05:45 UTC
|
HN request time: 0.21s
|
source
Show context
mrtesthah
◴[
30 Aug 25 07:05 UTC
]
No.
45072533
[source]
▶
>>45072160 (OP)
#
Do we know if any of these techniques are actually used in the so-called "frontier" models?
replies(3):
>>45072588
#
>>45073417
#
>>45076391
#
1.
zackangelo
◴[
30 Aug 25 17:23 UTC
]
No.
45076391
[source]
▶
>>45072533
#
Not quite a frontier model but definitely built by a frontier lab: Grok 2 was recently open sourced and I believe it uses a fairly standard MHA architecture with MoE.
ID:
GO
↑