I don't really follow this, but isn't it a bad sign for ROCm that, for example, ZLUDA + Blender 4's CUDA back-end delivers better performance than the native Radeon HIP back-end?
Could be that the CUDA backend has seen far more specialization optimizations whereas the seeingly fairly fresh HIP backend hasn't had as many developers looking at it, in the end a few more control instructions on the CPU side to go through the ZLUDA wrapper will be insignificant compared to all the time spent inside better optimized GPU kernels.
Surely this can be attributed to Blender's HIP code just being suboptimal because nobody really cares about it. By extension nobody cares about it because performance is suboptimal.
I'd say it's even worse, since for rendering Optix is like 30% faster than CUDA. But that requires the tensor cores. At this point AMD is waaay behind hardware wise.