This is exactly yet another reason why researchers prefer CUDA, to the alternatives.
https://developer.nvidia.com/nsight-systems