The hardware for a local model would cost years and years of a $20/mo subscription, would output lower quality work, and would be much slower.
3.7 Thinking is an insane programming model. Maybe it cannot do an SWE's job, but it sure as hell can write functional narrow-scope programs with a GUI.
Local hosting on GPU only really makes sense if you're doing many hours of training/inference daily.
Also "many hours of inference daily" may mean you're doing your usual stuff daily while running some processing in the background that takes hours/days or you've put together some reactive automation that runs often all the time.
ps. local training rarely makes sense.
ps. 2. not sure where you got 50x slower from; 4090 is actually faster than A100 for example and 5090 is ~75% faster than 4090