←back to thread

602 points emrah | 1 comments | | HN request time: 0.2s | source
Show context
api ◴[] No.43744419[source]
When I see 32B or 70B models performing similarly to 200+B models, I don’t know what to make of this. Either the latter contains more breadth of information but we have managed to distill latent capabilities to be similar, the larger models are just less efficient, or the tests are not very good.
replies(2): >>43744582 #>>43744783 #
1. simonw ◴[] No.43744582[source]
It makes intuitive sense to me that this would be possible, because LLMs are still mostly opaque black boxes. I expect you could drop a whole hunch of the weights without having a huge impact on quality - maybe you end up mostly ditching the parts that are derived from shitposts on Reddit but keep the bits from Arxiv for example.

(That's a massive simplification of how any of this works, but it's how I think about it at a high level.)