←back to thread

68 points GeorgeHaldane | 2 comments | | HN request time: 0.516s | source

Morning HN.

I often found myself wondering "how much does this code segment take in terms of total runtime" and it's often quite annoying to figure out with optimizations enabled, especially when working on something new or testing someone else's implementation without the proper tooling set up. Wanted to have a single include lib that would allow us to write something like:

``` PROFILE("Loop 1") for (...) // some work ```

and have the next expression automatically record time & dump results to a table. Wrote a few macros to do exactly that a few months back, but they were primitive and basically unusable for recursive code.

Tried to come up with a more generic solution that would build a call graph for nested profiler-macros, handle threads and etc. but doing so in a naive way would be super slow since we'd need some kind of a recursive map of nodes with callsites as a keys.

Recently had a revelation that it is possible to use macro-generated thread_local's to associate callsites with integer IDs on the fly and with some effort call graph can be neatly encoded in a few contiguous arrays with all graph building & traversal logic reduced to simple checks and array lookups. Realized threading can be quite easily supported too in an almost lock-free fashion.

After a few days of effort ended up building what I believe is a very much usable single-header profiling lib. Couldn't find anything quite like it, so I'd like to present it here and hear some opinions on the product:

https://github.com/DmitriBogdanov/UTL/blob/master/docs/modul...

1. bogwog ◴[] No.43686700[source]
How does the compare to Microprofile?

https://github.com/jonasmr/microprofile

Btw, I recently worked with a library that had their own profiler which generated a Chrome trace file, so you could load it up in the Chrome dev tools to explore the call graph and timings in a fancy UI.

It seems like such a good idea and I wish more profiling frameworks tried to do that instead of building their own UI.

replies(1): >>43687689 #
2. GeorgeHaldane ◴[] No.43687689[source]
Haven't worked with it, but based on initial look it's a quite different thing that stands closer to a frame-based profiler like Tracy (https://github.com/wolfpld/tracy).

As far as differences go:

Microprofile:

  - frame-based
  - needs a build system
  - memory usage starts at 2 MB per thread
  - runs 2 threads of its own
  - provides system-specific info
  - good support for GPU workloads
  - provides live view
  - seems like a good choice for gamedev / rendering
utl::profiler:

  - no specific pipeline
  - single include
  - memory usage starts at approx. nothing and would likely stay in kilobytes
  - doesn't run any additional threads
  - fully portable, nothing platform specific whatsoever, just standard C++
  - doesn't provide system-specific info, just pure timings
  - seems like a good choice for small projects or embedded (since the only thing it needs is a C++ compiler)