For my purposes I use it all the time though, quick and easy sanity-check.
$ hyperfine --warmup 3 './hello-world-bin-sh.sh' './hello-world-env-python3.py'
Benchmark 1: ./hello-world-bin-sh.sh
Time (mean ± σ): 1.3 ms ± 0.4 ms [User: 1.0 ms, System: 0.5 ms]
...
Benchmark 2: ./hello-world-env-python3.py
Time (mean ± σ): 43.1 ms ± 1.4 ms [User: 33.6 ms, System: 8.4 ms]
...
But for everything in the right range (milliseconds, seconds, minutes or above), hyperfine is well suited.
If you see any reason for putting “statistical” in quotes, please let us know. hyperfine does not run a lot of tests, but it does try to find outliers in your measurements. This is really valuable in some cases. For example: we can detect when the first run of your program takes much longer than the rest of the runs. We can then show you a warning to let you know that you probably want to either use some warmup runs, or a "--prepare" command to clean (OS) caches if you want a cold-cache benchmark.
> But there’s no good way to say “just run it for 5 seconds and give me the best answer you can”.
What is the "best answer you can"?
> It’s very much designed for nanosecond to low microsecond benchmarks.
Absolutely not. With hyperfine, you can not measure execution times in the "low microsecond" range, let alone nanosecond range. See also my other comment.
hyperfine -N -- ls "$dir" \; my_ls "$dir"
Looks fine to me. Obviously it's too late to undo that mistake, but a new flag to enable new behavior wouldn't hurt anyone.
Back in the day my goal for Advent of Code was to run all solutions in under 1 second total. Hyperfine would take like 30 minutes to benchmark a 1 second runtime.
It was hyper frustrating. I could not find a good way to get Hyperfine to do what I wanted.
> I could not find a good way to get Hyperfine to do what I wanted
This is all documented here: https://github.com/sharkdp/hyperfine/tree/master?tab=readme-... under "Basic benchmarks". The options to control the amount of runs are also listed in `hyperfine --help` and in the man page. Please let us know if you think we can improve the documentation / discovery of those options.
Current defaults: "By default, it will perform at least 10 benchmarking runs and measure for at least 3 seconds." If your program takes 1s to run, it should take 10 seconds to benchmark.
Is it possible that your program was waiting for input that never came? One "gotcha" is that it expects each argument to be a full program, so if you ran `hyperfine ./a.out input.txt`, it will first bench a.out with no args, then try to bench input.txt (which will fail). If a.out reads from stdin when no argument is given, then it would hang forever, and I can see why you'd give up after a half hour.