Maybe at some point, Apple is/was trying to do everything locally but it appears they have recently decided to move away from that idea and use OpenAI.
I can understand why: you’re only using locally-run AI models every so often (maybe a few times a day), but when you use it, you still want it to be fast.
So it will need to be a pretty heavy AI chip in your phone to be able to deliver that, which spends most of the time idling.
Since compute costs are insane for AI, it only makes sense to optimize this and do the inference in the cloud.
Maybe at some point local AI will be possible, but they’ll always be able to run much more powerful models in the cloud, because it makes much more sense from an economics point of view.