←back to thread

337 points throw0101c | 1 comments | | HN request time: 0.207s | source
Show context
mikewarot ◴[] No.44609878[source]
I'm waiting for the shoe to drop when someone comes out with an FPGA optimized for reconfigurable computing and lowers the cost of llm compute by 90% or better.
replies(6): >>44609898 #>>44609932 #>>44610004 #>>44610118 #>>44610319 #>>44610367 #
pdhborges ◴[] No.44609898[source]
Where is that improvement coming from? Hardware is already here to compute gemm as fast as possible.
replies(1): >>44609936 #
1. leakyfilter ◴[] No.44609936[source]
Raw gemm computation was never the real bottleneck, especially on the newer GPUs. Feeding the matmuls i.e memory bandwidth is where it’s at, especially in the newer GPUs.