Kimi Linear: An Expressive, Efficient Attention Architecture

1. logicartisan ◴[31 Oct 25 11:58 UTC] No.45771056[source]▶

Amazing how fast AI keeps improving, every new model feels like a big step forward

2. hirako2000 ◴[31 Oct 25 12:21 UTC] No.45771240[source]▶

It solely is improving on efficiency. While it is extremely valuable given the disproportionate (to value) costs of these things, your statement almost sounds like it has improved an even more challenging aspect, pushing performance.

replies(4): >>45771821 #>>45771931 #>>45774235 #>>45774316 #

3. embedding-shape ◴[31 Oct 25 13:34 UTC] No.45771821[source]▶

>>45771240 #

It's a generic comment that I don't think is even specifically about Kimi Linear or this submission, you could leave the same comment on almost any AI/ML submission and it'd say the same amount and be as relevant/irrelevant.

replies(1): >>45772166 #

4. giancarlostoro ◴[31 Oct 25 13:46 UTC] No.45771931[source]▶

>>45771240 #

I've uh said this a few times. But AI is a bunch of people overpaying CS students to implement old algorithms and then realizing that they need Software Engineers to optimize the existing known systems. Most of AI (if not ALL of it) as we know it today has been coded for decades, we just never had the hardware for it.

A lot of the optimizations are not some ground breaking new way to program, they're known techniques to any Software Engineer or Systems Engineer.

replies(1): >>45771990 #

5. embedding-shape ◴[31 Oct 25 13:53 UTC] No.45771990{3}[source]▶

>>45771931 #

> A lot of the optimizations are not some ground breaking new way to program

Hindsight is a bitch huh? Everything looks simple now once people proved it kind of works, but I think you over-simplify "how easy it is".

Lots of stuff in ML, particularly recent ~5 years or so, haven't been "implementing old algorithms" although of course everything is based on the research that happened in the past, we're standing on the shoulders of giants and all that.

6. hirako2000 ◴[31 Oct 25 14:10 UTC] No.45772166{3}[source]▶

>>45771821 #

Agreed it would apply to any method or system that improved on efficiency. It doesn't diminish the feat. Not trying to minimize the impact of Kimi linear's gain. It is a novel and outstanding benefit applicable to LLMs.

7. naasking ◴[31 Oct 25 17:04 UTC] No.45774235[source]▶

>>45771240 #

There has been work that has pushed performance too, like tiny recursive models. Applying LLMs in recursive loops also improves output, so efficiency improvements make this viable, which can count as improvements in performance.

8. acuozzo ◴[31 Oct 25 17:11 UTC] No.45774316[source]▶

>>45771240 #

> It solely is improving on efficiency.

Consider the implications of increases in efficiency *when you hold compute constant*.

The win is far more obvious when it's "we can do more with what we have" instead of "we can do the same with less".