←back to thread

579 points paulpauper | 1 comments | | HN request time: 0.285s | source
1. karmakaze ◴[] No.43605984[source]
> [...] But I would nevertheless like to submit, based off of internal benchmarks, and my own and colleagues' perceptions using these models, that whatever gains these companies are reporting to the public, they are not reflective of economic usefulness or generality. [...]

Seems like they're looking at how they fail and not considering how they're improving in how they succeed.

The efficiency in DeepSeek's Multi-Head Latent Attention[0] is pure advancement.

[0] https://youtu.be/0VLAoVGf_74?si=1YEIHST8yfl2qoGY&t=816