Great work! It's nice seeing another observability tool. Demo is neat and easy to navigate.
Couple of questions:
What's the overhead of tracing + logging observed by users? I see many tools being built on top of the OpenTelemetry eBPF tracer, which is nice to see.
The OpenTelemetry eBPF tracer uses sampling to capture traces. Do other types of logging in the tool use sampling as well (HTTP traces)?
When finding SLO violations, can this tool find the bug if the latency spikes do not happen frequently (ie, latency spikes happens every 5minutes - 1hour)? I'm curious if the team have had experienced such events and even if those pmax latencies matter to customers since it may not happen frequently.
I see that the flamegraph is a CPU flamegraph - does off-cpu sampling matter (Disk/Network, etc...)? Or does the CPU flamegraph provide enough for developers to solve the issue?