(manifestai.com)

7 points cgel | 1 comments | 30 Oct 25 01:48 UTC | HN request time: 0.209s | source

1. cgel ◴[30 Oct 25 01:48 UTC] No.45755526[source]▶

We have trained a completely attention-free LLM whose performance is competitive with state-of-the-art models. This model, which we call Brumby-14B-Base, has a familiar Transformer-style architecture, except it uses power retention layers instead of attention layers. It is available on Huggingface.

↑

Brumby-14B-Base: The Strongest Attention-Free Base Model