/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Multi-Token Attention
(arxiv.org)
152 points
fzliu
| 1 comments |
02 Apr 25 22:20 UTC
|
HN request time: 0.211s
|
source
Show context
bob1029
◴[
02 Apr 25 23:14 UTC
]
No.
43562889
[source]
▶
>>43562384 (OP)
#
So, we're proposing a multiplicative increase of something that already scales quadratically with the context size?
I think we've already got a bit of a bottleneck in terms of memory bandwidth utilization.
replies(4):
>>43563169
#
>>43563334
#
>>43563390
#
>>43563970
#
1.
kadushka
◴[
03 Apr 25 00:21 UTC
]
No.
43563390
[source]
▶
>>43562889
#
If you have a bottleneck in terms of memory bandwidth utilization, this method is great - it would utilize the idle compute.
ID:
GO
↑