I think they've likely done a great job implementing it and think it will significantly improve iPhone security. I dislike the over the top marketing resembling a technical blog post. It's as if they've deployed CHERI in production with near 0 overhead rather than an incremental improvement over what standard ARM Cortex cores shipped years ago which people have been using in production.
Others are aware of where MTE needs improvement and are working on it for years. Cortex shipped MTE with a side channel issue which is better than not shipping it and it will get addressed. Apple has plenty of their own side channel vulnerabilities for their CPUs. Deterministic protections provided via MTE aren't negatively impacted by the side channel and also avoid depending on only 4 bits of entropy. The obvious way to use MTE is not the only way to use it.
GrapheneOS began using MTE in production right after the Pixel 8 provided a production quality implementation, which was significantly later than it could have been made available since Pixels aren't early adopters of new Cortex cores. On those cores, asynchronous MTE is near free and asymmetric is comparable to something like -fstack-protector-strong. Synchronous is relatively expensive, so making that perform better than the early Cortex cores providing MTE seems to be where Apple made a significant improvement. Apple has higher end, larger cores than the current line of Cortex cores. Qualcomm's MTE implementation will be available soon and will be an interesting comparison. We expect Android to heavily adopt it and therefore it will be made faster out of necessity. The security advantage of synchronous over asymmetric for userspace is questionable. It's clearer within the kernel, where little CPU time is spent on an end user device. We use synchronous in the kernel and asymmetric in userspace. We haven't offered full synchronous as an option mainly because we don't have any example of it making a difference. System calls act as a synchronization point in addition to reads. io_uring isn't available beyond a few core processes, etc.