I don't really follow this, but isn't it a bad sign for ROCm that, for example, ZLUDA + Blender 4's CUDA back-end delivers better performance than the native Radeon HIP back-end?
I'd say it's even worse, since for rendering Optix is like 30% faster than CUDA. But that requires the tensor cores. At this point AMD is waaay behind hardware wise.