Doing this encrypted is very slow: without hardware acceleration or special tricks, running the circuit is 1 million times slower than unencrypted, or about 1ms for a single gate. (
When you think about all the individual logic gates involved in just a matrix multiplication, and scale it up to a diffusion model or large transformer, it gets infeasible very quickly.
For some numbers, a ResNet-20 inference can be done in CKKS in like 5 minutes on CPU. With custom changes to the architecture you can get less than one minute, and in my view HW acceleration will improve that by another factor of 10-100 at least, so I'd expect 1s inference of these (still small) networks within the next year or two.
LLMs, however, are still going to be unreasonably slow for a long time.