←back to thread

51 points jkool702 | 1 comments | | HN request time: 0.266s | source

Note: this is an update to [this](https://news.ycombinator.com/item?id=44568529) "Show HN" post.

timep is a state-of-the-art [debug-]trap-based bash profiler that is efficient and extremely accurate. Unlike other profilers, timep records:

1. per-command wall-clock time

2. per-command CPU time, and

3. the hierarchy of parent function calls /subshells for each command

the wall-clock + CPU time combination allows you to determine if a particular command is CPU-bound or IO-bound, and the hierarchical logging gives you a map of how the code actually executed.

The standout feature of timep is that it will take these records and automatically generate a bash-native flamegraph (that shows bash commands, not syscalls).

------------------------------------------------

USAGE

timep is extremely easy to use - just source the `timep.bash` file from the repo and add "timep" in front of whatever you want to profile. for example:

    . /path/to/timep.bash
    timep ./some_script
    echo "stdin" | timep some_function
ZERO changes need to be made to the code being profiled!

------------------------------------------------

EXAMPLES

[test code that will be profiled](https://github.com/jkool702/timep/blob/main/TESTS/timep.test...)

[output profile for that test code](https://github.com/jkool702/timep/blob/main/TESTS/OUTPUT/out...)

[flamegraph for that test code](https://github.com/jkool702/timep/blob/main/TESTS/OUTPUT/fla...)

[flamegraph from a "real world" test of "forkrun", a parallelization engine written in bash](https://github.com/jkool702/timep/blob/main/TESTS/FORKRUN/fl...)

In the "forkrun test", 13 different checksums were computed for ~670k small files on a ramdisk using 28 parallel workers. this was repeated twice. In total, this test ran around 67,000 individual bash commands. [This is its `perf stat` (without timep)](https://github.com/jkool702/timep/blob/main/TESTS/FORKRUN/pe...).

------------------------------------------------

EFFICIENCY AND ACCURACY

The forkrun test (see "examples" section above) was basically as demanding of a workload as one can have in bash. it fully utilized 24.5 cores on a 14c/28t i9-7940x CPU, racking up >840 seconds of CPU time in ~34.5 seconds of wall-clock time. When profiling this group of 67,000 commands with timep:

1. the time it took for the code to run with the debug-trap instrumentation was ~38 seconds, an increase of just slightly over 10%. CPU time had a similiar increase.

2. the time profile was ready at +2 minutes (1 minute + 15 seconds after the profiling run finished)

3. the flamegraphs were ready at +5 minutes (4 minute + 15 seconds after the profiling run finished)

Note that timep records both "start" and "stop" timestamps for every command, and the debug trap instrumentation runs between one commands "stop" timestamp and the next commands "start" timestamp, meaning the error in the profiles timings is far less than the 10% overhead. Comparing the total (sys+user) CPU time that perf stat gave (without using timep) and the CPU time timep gives (from summing together the CPU time of all 67,000-ish commands), the difference is virtually always less than 0.5%, and often less than 0.2%. Ive seen as low as 0.04%, which is 1/3 of a second on a run that took ~850 seconds of CPU time.

------------------------------------------------

MAJOR CHANGES SINCE THE LAST "SHOW HN" POST

1. CPU time is now recorded too (instead of just wall-clock time). This is done via a loadable builtin that calls `getrusage` and (if available) `clock_gettime` to efficiently and accurate determine the CPU time of the process and all its descendants.

2. the .so file required to use the loadable builtin mentioned in #1 is built directly into the script has an embedded compressed base64 sequence. I also developed the bash-native compression scheme that it uses. The .so files for x86_64, aarch64, ppc64le and i686 are all included. Im hoping to add arm7 soon as well. the flamegraph generator perl script is also embedded, making the script 100% fully self-contained. NOTE: these embedded base64 strings include both sha256 and md5 checksums of the resulting .so file that are verified on extraction.

3. the flamegraph generation has been completely overhauled. The flamegraphs now a) are colored based on runtime (hot colors = longer runtime), b) desaturate colors for commands where cpu time << wall-clock time (e.g., blocking reads, sleep, wait, ...), and c) use a runtime-weighted CDF color mapping that ensures, regardless of the distribution of the underlying data, that the resulting flamegraph has a roughly equal amount of each color in the colorspace (where "equal" means "the same number of pixels are showing each color"). timep also combines multiple flamegraphs (that show wallclock time vs cpu time and that us the full vs folded set of traces) by vertically stacking them into a single SVG image, giving "dual stack" and "quad stack" flamegraphs.

4. the post-processing workflow has been basically completely re-written, making it more robust, easier to understand/maintain, and much faster. The "forkrun" test linked above (that ran 67,000 commands) previously took ~20 minutes. With the new version, you can get a profile in 2 minutes or a profile + flamegraph in 5 minutes - a 4x to 10x speedup!

Show context
imiric ◴[] No.45020243[source]
I'm surprised there's been no discussion on this project.

A part of me thinks that if you have to profile a Bash script, it's a sign that you've surpassed its intended use case, and a proper programming language with all its modern tooling would be a saner way forward. But then again, that line is often blurry, and if the script has already grown in size, it might be a sunk cost to consider rewriting it, so I can see cases where this tool would come in handy.

I find it hard to believe that there is minimal overhead from the instrumentation, as the README claims. Surely the additional traps alone introduce an overhead, and tracking the elapsed time of each operation even more so. It would be interesting to know what the actual overhead is. Perhaps this is difficult to measure with a small script, whereas a larger one would show a greater difference.

The profile output is a bit difficult to parse at first glance, but I think I get it after a closer look.

In any case, this is some next-level wizardry. The amount of patience required to instrument Bash scripts must've been monumental. Kudos to you, Sir.

BTW, you're probably aware of it, but you might want to consider using Bats[1] for tests. I've found it helpful for small scripts.

EDIT: I took a look at `timep.bash`, and I may have nightmares tonight... Sweet, fancy Moses.

Also, I really don't like that it downloads and loads some random .so files, and base64-encoded blobs. I would personally not use this based on that alone, and you shouldn't assume that users will trust you. If this is required for correct functionality, have a separate well-documented step for users to download the required files manually. Or better yet: make their source visible as well, and have users compile them on their own.

[1]: https://bats-core.readthedocs.io/

replies(2): >>45020763 #>>45029628 #
1. jkool702 ◴[] No.45029628[source]
also re: BATS

Im aware of it, but have never ended up actually using it. Ive heard before the sentiment you imply - that its great for fairly simple script...but not so much for long and complicated scripts. And, well, the handful of bash projects that Ive developed enough to have the desire to add unit testing for are all range from "long and complicated" to "nightmare inducing" (lol).

Its on my "to try" list one day, but I have a sneaking suspicion that timep is not a good project to try out BATS for the first time on.