←back to thread

255 points tbruckner | 1 comments | | HN request time: 0.303s | source
Show context
adam_arthur ◴[] No.37420461[source]
Even a linear growth rate of average RAM capacity would obviate the need to run current SOTA LLMs remotely in short order.

Historically average RAM has grown far faster than linear, and there really hasn't been anything pressing manufacturers to push the envelope here in the past few years... until now.

It could be that LLM model sizes keep increasing such that we continue to require cloud consumption, but I suspect the sizes will not increase as quickly as hardware for inference.

Given how useful GPT-4 is already. Maybe one more iteration would unlock the vast majority of practical use cases.

I think people will be surprised that consumers ultimately end up benefitting far more from LLMs than the providers. There's not going to be much moat or differentiation to defend margins... more of a race to the bottom on pricing

replies(8): >>37420537 #>>37420948 #>>37421196 #>>37421214 #>>37421497 #>>37421862 #>>37421945 #>>37424918 #
tomohelix ◴[] No.37420537[source]
RAM is easy. The hard part is making the unified memory SOC like Apple's. From what I know, Apple performance is almost magic. And whatever Apple is making, they are at peak capacity already and they can't make more even if they want to. Nobody else has a comparable technology. Apple is in its own league.
replies(1): >>37426757 #
1. AnthonyMouse ◴[] No.37426757[source]
Apple is just using a wide memory bus, the same as GPUs and server-class x86 CPUs do. It's not even hard, it's just not something desktop CPUs previously had any use for so the current sockets don't support it.

And you could do the same thing without even changing the socket by including RAM on the CPU package as an L4 cache. Some of the Intel server CPUs are already doing this.