This entire paradigm gets turned on its head with AI. I tried to do this with purely local compute, and it's a bad play. We don't have good edge compute yet.
1. A lot of good models require an amount of VRAM that is only present in data center GPUs.
2. For models which can run locally (Flux, etc.), you get dramatically different performance between top of line cards and older GPUs. Then you have to serve different models with different sampling techniques to different hardware classes.
3. GPU hardware is expensive and most consumers don't have GPUs. You'll severely limit your TAM if you require a GPU.
4. Mac support is horrible, which alienates half of your potential customers.
It's best to follow the Cursor model where the data center is a necessary evil and the local software is an adapter and visualizer of the local file system.