> Why wouldn’t 3rd party hardware vendors continue to work on reducing costs of running models locally?
Every one wants this to happen they are all trying but...
EUV, what has gotten us down to 3nm and less is HARD. Reduction in chip size has lead to increases in density and lower costs. But now yields are DOWN and the design concessions to make the processes work are hurting costs and performance. There are a lot of hopes and prayers in the 1.8 nodes but things look grim.
Power is a massive problem for everyone. It is a MASSIVE a problem IN the data center and it is a problem for GPU's at home. Considering that locally is a PHONE for most people it's an even bigger problem. With all this power comes cooling issues. The industry is starting to look at all sorts of interesting ways to move heat away from cores... ones that don't involve air.
Design has hit a wall as well. If you look at NVIDIA's latest offering its IPC, (thats Instructions Per Clock cycle) you will find they are flat. The only gains between the latest generation and previous have come from small frequency upticks. These gains came from using "more power!!!", and thats a problem because...
Memory is a problem. There is a reason that the chips for GPU's are soldered on to the boards next to the processors. There is a reason that laptops have them soldered on too. CAMM try's to fix some of this but the results are, to say the least, disappointing thus far.
All of this has been hitting cpu's slowly, but we have also had the luxury of "more cores" to throw at things. If you go back 10-15 years a top end server is about the same as a top end desktop today (core count, single core perf). Because of all of the above issues I don't think you are going to get 700+ core consumer desktops in a decade (current high end for server CPU)... because of power, costs etc.
Unless we see some foundational breakthrough in hardware (it could happen), you wont see the normal generational lift in performance that you have in the past (and I would argue that we already haven't been seeing that). Someone is going to have to make MAJOR investments in the software side, and there is NO MOAT by doing so. Simply put it's a bad investment... and if we can't lower the cost of compute (and it looks like we can't) its going to be hard for small players to get in and innovate.
It's likely you're seeing a very real wall.