I don't really follow this, but isn't it a bad sign for ROCm that, for example, ZLUDA + Blender 4's CUDA back-end delivers better performance than the native Radeon HIP back-end?
Surely this can be attributed to Blender's HIP code just being suboptimal because nobody really cares about it. By extension nobody cares about it because performance is suboptimal.