General rundown of the logic can be found in this comment on reddit:
https://www.reddit.com/r/cpp/comments/1jy6ver/comment/mmze20...About linear IDs: A call graph in general case is a tree of nodes, each node has a single parent and an arbitrary amount of children. Each node accumulates time spend in the "lower" branches. A neat property of the callgraph relative to a generic tree, is that every node can be associated with a callsite. For example, if a some function f() calls itself 3 recursively, there will be multiple nodes corresponding to it, but in terms of callsite there is still only one. So lets take some simple call graph as an example:
Callgraph: f() -> f() -> f() -> g()
-> h()
Node id: 0 1 2 3,4
Let's say f() has callsite id '0', g() has callsite id '1', h() has callsite id '2'. The callgraph will then consist of N=5 nodes with M=3 different callsites:
Node id: { 0 1 2 3 4 }
Callsite id: { 0 0 0 1 2 }
We can then encode all "prev."" nodes as a single N vector, and all "next" nodes as a MxN matrix, which has some kind of sentinel value (like -1) in places with no connection. For this example this will result in following:
Node id: { 0 1 2 3 4 }
Prev. id: { x 0 1 2 2 }
Next id matrix: [ 1 2 3 x x ]
[ x x 4 x x ]
Every thread has a thread-local callgraph object that keeps track of all this graph traversal, it holds 'current_node_id'. Traversing backwards on the graph is a single array lookup:
current_node_id = prev_node_ids[current_node_id];
Traversing forwards to an existing callgraph node is a lookup & branch:
next_node_id = next_node_ids[callsite_id, current_node_id]
if (next_node_id = x) create_node(next_node_id); // will be usually predicted
else current_node_id = next_node_id;
New nodes can be created pretty cheaply too, but too verbose for a comment. The key to tracking the callsites and assigning them IDs is thread_local local variables generated by the macro:
https://github.com/DmitriBogdanov/UTL/blob/master/include/UT...
When callsite marker initializes (which only happens once), it gets a new ID. Timer then gets this 'callsite_id' an passes it to the forwards-traversal. The way we get function names is by simply remembering __FILE__, __func__, __LINE__ pointers in another array of the call graph, they get saved during the callsite marker initialization too. As far as performance goes everything we do is cheap & simple operations, at this point the main overhead is just from taking the timestamps.