A while ago I was working on some CUDA kernels for n-body physics simulations. It wasn’t too complicated and the end result was generative art. The problem was that it was quite slow and I didn’t know why. Well the core of the application was written in Clojure so I wrote a simple macro to wrap every function in a ns with a span and then ship all the data to jaeger. This ended up being exactly what I needed - I found out that the two slowest functions were data transfer between the GPU memory and writing out a frame (image) to my disk.
In many other places I see the usefulness of this approach but OTel is too often too geared towards HTTP services. Even simple async/queue processing is not as simple. Though, there have been improvements (like span links and trace links).