/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Multi-Token Attention
(arxiv.org)
152 points
fzliu
| 1 comments |
02 Apr 25 22:20 UTC
|
HN request time: 0.22s
|
source
1.
curiousfiddler
◴[
03 Apr 25 01:54 UTC
]
No.
43563854
[source]
▶
>>43562384 (OP)
#
So, why would this extract more semantic meaning than multi-head attention? Isn't the whole point of multiple heads similar to how CNNs use multiple types of filters to extract different semantic relationships?
ID:
GO
↑