←back to thread

337 points throw0101c | 1 comments | | HN request time: 0s | source
Show context
mikewarot ◴[] No.44609878[source]
I'm waiting for the shoe to drop when someone comes out with an FPGA optimized for reconfigurable computing and lowers the cost of llm compute by 90% or better.
replies(6): >>44609898 #>>44609932 #>>44610004 #>>44610118 #>>44610319 #>>44610367 #
crystal_revenge ◴[] No.44610004[source]
This is where I do wish we had more people working on the theoretical CS side of things in this space.

Once you recognize that all ML techniques, including LLMs, are fundamentally compression techniques you should be able to come up with some estimates of the minimum feasible size of an LLM based on: information that can be encoded in a given parameter size, relationship between loss of information and model performance, and information contained in the original data set.

I simultaneously believe LLMs are bigger than the need to be, but suspect they need to be larger than most people think given that you are trying to store a fantastically large amount of information. Even given lossy compression (which ironically is what makes LLMs "generalize"), we're still talking about an enormous corpus of data we're trying to represent.

replies(1): >>44610083 #
1. sfpotter ◴[] No.44610083[source]
Getting theoretical results along these lines that can be operationalized meaningfully is... really hard.