(vinithavn.medium.com)

169 points mgninad | 1 comments | 30 Aug 25 05:45 UTC | HN request time: 0.203s | source

Show context

mrtesthah ◴[30 Aug 25 07:05 UTC] No.45072533[source]▶

Do we know if any of these techniques are actually used in the so-called "frontier" models?

replies(3): >>45072588 #>>45073417 #>>45076391 #

1. gchadwick ◴[30 Aug 25 10:10 UTC] No.45073417[source]▶

Who knows what the closed source models use but certainly going by what's happening in open models all the big changes and corresponding gains in capability are in training techniques not model architecture. Things like GQA and MLA as discussed in this article are important techniques for getting better scaling but are relatively minor tweak vs the evolution in training techniques.

I suspect closed models aren't doing anything too radically different from what's presented here.

↑

From multi-head to latent attention: The evolution of attention mechanisms