(www.zyphra.com)

282 points dataminer | 1 comments | 14 Oct 24 22:45 UTC | HN request time: 0s | source

Show context

iamronaldo ◴[14 Oct 24 23:05 UTC] No.41843139[source]▶

>>41842975 (OP) #

Not transformer based?

replies(3): >>41843175 #>>41843177 #>>41843268 #

1. oatsandsugar ◴[14 Oct 24 23:10 UTC] No.41843175[source]▶

>>41843139 #

On the page it states:

Our novel shared-attention architecture allows more parameters to be allocated to the Mamba2 backbone. In turn, the shared transformer block preserves the rich cross-sequence dependencies of the attention computation.

so sounds like it is transformer based?

↑

Zamba2-7B