←back to thread

113 points sethkim | 1 comments | | HN request time: 0.2s | source
Show context
georgeburdell ◴[] No.44458085[source]
Is there math backing up the “quadratic” statement with LLM input size? At least in the traffic analogy, I imagine it’s exponential, but for small amounts exceeding some critical threshold, a quadratic term is sufficient
replies(1): >>44458325 #
gpm ◴[] No.44458325[source]
Every token has to calculate attention for every previous token, that is that attention takes O(sum_i=0^n i) work, sum_i=0^n i = n(n-1)/2, so that first expression is equivalent to O(n^2).

I'm not sure where you're getting an exponential from.

replies(1): >>44459280 #
1. ◴[] No.44459280[source]