> Robust statistics with p-values (not just min/max, compensation for multiple hypotheses, no Gaussian assumptions)
This is not included in the core of hyperfine, but we do have scripts to compute "advanced" statistics, and to perform t-tests here: https://github.com/sharkdp/hyperfine/tree/master/scripts
Please feel free to comment here if you think it should be included in hyperfine itself: https://github.com/sharkdp/hyperfine/issues/523
> Automatic isolation to the greatest extent possible (given appropriate permissions)
This sounds interesting. Please feel free to open a ticket if you have any ideas.
> Interleaved execution, in case something external changes mid-way.
Please see the discussion here: https://github.com/sharkdp/hyperfine/issues/21
> It just… runs things N times and then does a naïve average/min/max?
While there is nothing wrong with computing average/min/max, this is not all hyperfine does. We also compute modified Z-scores to detect outliers. We use that to issue warnings, if we think the mean value is influenced by them. We also warn if the first run of a command took significantly longer than the rest of the runs and suggest counter-measures.
Depending on the benchmark I do, I tend to look at either the `min` or the `mean`. If I need something more fine-grained, I export the results and use the scripts referenced above.
> At that rate, one could just as well use a shell script and eyeball the results.
Statistical analysis (which you can consider to be basic) is just one reason why I wrote hyperfine. The other reason is that I wanted to make benchmarking easy to use. I use warmup runs, preparation commands and parametrized benchmarks all the time. I also frequently use the Markdown export or the JSON export to generate graphs or histograms. This is my personal experience. If you are not interested in all of these features, you can obviously "just as well use a shell script".