←back to thread

Zamba2-7B

(www.zyphra.com)
282 points dataminer | 1 comments | | HN request time: 0s | source
Show context
iamronaldo ◴[] No.41843139[source]
Not transformer based?
replies(3): >>41843175 #>>41843177 #>>41843268 #
1. oatsandsugar ◴[] No.41843175[source]
On the page it states:

Our novel shared-attention architecture allows more parameters to be allocated to the Mamba2 backbone. In turn, the shared transformer block preserves the rich cross-sequence dependencies of the attention computation.

so sounds like it is transformer based?