Hyperfine: A command-line benchmarking tool

1. forrestthewoods ◴[19 Nov 24 06:37 UTC] No.42180588[source]▶

Hyperfine is hyper frustrating because it only works with really really fine microsecond level benchmarks. Once you get into the millisecond range it’s worthless.

replies(2): >>42180660 #>>42182084 #

2. anotherhue ◴[19 Nov 24 06:50 UTC] No.42180660[source]▶

>>42180588 (TP) #

It spawns a new process each time right? I would think that would but a cap on how accurate it can get.

For my purposes I use it all the time though, quick and easy sanity-check.

replies(2): >>42180722 #>>42180749 #

3. forrestthewoods ◴[19 Nov 24 07:03 UTC] No.42180722[source]▶

>>42180660 #

The issue is it runs a kajillion tests to try and be “statistical”. But there’s no good way to say “just run it for 5 seconds and give me the best answer you can”. It’s very much designed for nanosecond to low microsecond benchmarks. Trying to fight this is trying to smash a square peg through a round hole.

replies(3): >>42180876 #>>42180891 #>>42182129 #

4. oguz-ismail ◴[19 Nov 24 07:09 UTC] No.42180749[source]▶

>>42180660 #

It spawns a new shell for each run and subtracts the average shell startup time from final results. Too much noise

replies(1): >>42180880 #

5. gforce_de ◴[19 Nov 24 07:39 UTC] No.42180876{3}[source]▶

>>42180722 #

At least it gives some numbers and point in a direction:

  $ hyperfine --warmup 3 './hello-world-bin-sh.sh' './hello-world-env-python3.py'
  Benchmark 1: ./hello-world-bin-sh.sh
    Time (mean ± σ):       1.3 ms ±   0.4 ms    [User: 1.0 ms, System: 0.5 ms]
  ...
  Benchmark 2: ./hello-world-env-python3.py
    Time (mean ± σ):      43.1 ms ±   1.4 ms    [User: 33.6 ms, System: 8.4 ms]
  ...

6. PhilipRoman ◴[19 Nov 24 07:40 UTC] No.42180880{3}[source]▶

>>42180749 #

The shell can be disabled, leaving just fork+exec

replies(1): >>42182044 #

7. PhilipRoman ◴[19 Nov 24 07:42 UTC] No.42180891{3}[source]▶

>>42180722 #

I disagree that it is designed for nano/micro benchmarks. If you want that level of detail, you need to stay within a single process, pinned to a core which is isolated from scheduler. At least I found it almost impossible to benchmark assembly routines with it.

8. sharkdp ◴[19 Nov 24 10:48 UTC] No.42182044{4}[source]▶

>>42180880 #

Yes. If you don't make use of shell builtins/syntax, you can use hyperfine's `--shell=none`/`-N` option to disable the intermediate shell.

replies(1): >>42182636 #

9. sharkdp ◴[19 Nov 24 10:54 UTC] No.42182084[source]▶

>>42180588 (TP) #

That doesn't make a lot of sense. It's more like the opposite of what you are saying. The precision of hyperfine is typically in the single-digit millisecond range. Maybe just below 1 ms if you take special care to run the benchmark on a quiet system. Everything below that (microsecond or nanosecond range) is something that you need to address with other forms of benchmarking.

But for everything in the right range (milliseconds, seconds, minutes or above), hyperfine is well suited.

replies(1): >>42184949 #

10. sharkdp ◴[19 Nov 24 11:00 UTC] No.42182129{3}[source]▶

>>42180722 #

> The issue is it runs a kajillion tests to try and be “statistical”.

If you see any reason for putting “statistical” in quotes, please let us know. hyperfine does not run a lot of tests, but it does try to find outliers in your measurements. This is really valuable in some cases. For example: we can detect when the first run of your program takes much longer than the rest of the runs. We can then show you a warning to let you know that you probably want to either use some warmup runs, or a "--prepare" command to clean (OS) caches if you want a cold-cache benchmark.

> But there’s no good way to say “just run it for 5 seconds and give me the best answer you can”.

What is the "best answer you can"?

> It’s very much designed for nanosecond to low microsecond benchmarks.

Absolutely not. With hyperfine, you can not measure execution times in the "low microsecond" range, let alone nanosecond range. See also my other comment.

11. oguz-ismail ◴[19 Nov 24 12:19 UTC] No.42182636{5}[source]▶

>>42182044 #

You still need to quote the command though. `hyperfine -N ls "$dir"' won't work, you need `hyperfine -N "ls ${dir@Q}"' or something. It'd be better if you could specify commands like with `find -exec'.

replies(1): >>42182728 #

12. PhilipRoman ◴[19 Nov 24 12:32 UTC] No.42182728{6}[source]▶

>>42182636 #

Oh that sucks, I really hate when programs impose useless shell parsing instead of letting the user give an argument vector natively.

replies(1): >>42183105 #

13. sharkdp ◴[19 Nov 24 13:20 UTC] No.42183105{7}[source]▶

>>42182728 #

I don't think it's useless. You can use hyperfine to run multiple benchmarks at the same time, to get a comparison between multiple tools. So if you want it to work without quotes, you need to (1) come up with a way to separate commands and (2) come up with a way to distinguish hyperfine arguments from command arguments. It's doable, but it's also not a great UX if you have to write something like

    hyperfine -N -- ls "$dir" \; my_ls "$dir"

replies(1): >>42183430 #

14. oguz-ismail ◴[19 Nov 24 13:53 UTC] No.42183430{8}[source]▶

>>42183105 #

> not a great UX

Looks fine to me. Obviously it's too late to undo that mistake, but a new flag to enable new behavior wouldn't hurt anyone.

15. forrestthewoods ◴[19 Nov 24 16:03 UTC] No.42184949[source]▶

>>42182084 #

No it’s not.

Back in the day my goal for Advent of Code was to run all solutions in under 1 second total. Hyperfine would take like 30 minutes to benchmark a 1 second runtime.

It was hyper frustrating. I could not find a good way to get Hyperfine to do what I wanted.

replies(2): >>42185113 #>>42186330 #

16. sharkdp ◴[19 Nov 24 16:18 UTC] No.42185113{3}[source]▶

>>42184949 #

If that's the case, I would consider it a bug. Please feel free to report it. In general, hyperfine should not take longer than ~3 seconds, unless the command itself takes > 300 ms second to run. In the latter case, we do a minimum of 10 runs by default. So if your program takes 3 min for a single iteration, it would take 30 min by default — yes. But this can be controlled using the `-m`/`--min-runs` option. You can also specify the exact amount of runs using `-r`/`--runs`, if you prefer that.

> I could not find a good way to get Hyperfine to do what I wanted

This is all documented here: https://github.com/sharkdp/hyperfine/tree/master?tab=readme-... under "Basic benchmarks". The options to control the amount of runs are also listed in `hyperfine --help` and in the man page. Please let us know if you think we can improve the documentation / discovery of those options.

17. fwip ◴[19 Nov 24 17:57 UTC] No.42186330{3}[source]▶

>>42184949 #

I've been using it for about four or five years, and never experienced this behavior.

Current defaults: "By default, it will perform at least 10 benchmarking runs and measure for at least 3 seconds." If your program takes 1s to run, it should take 10 seconds to benchmark.

Is it possible that your program was waiting for input that never came? One "gotcha" is that it expects each argument to be a full program, so if you ran `hyperfine ./a.out input.txt`, it will first bench a.out with no args, then try to bench input.txt (which will fail). If a.out reads from stdin when no argument is given, then it would hang forever, and I can see why you'd give up after a half hour.

replies(1): >>42186830 #

18. sharkdp ◴[19 Nov 24 18:50 UTC] No.42186830{4}[source]▶

>>42186330 #

> Is it possible that your program was waiting for input that never came?

We do close stdin to prevent this. So you can benchmark `cat`, for example, and it works just fine.

replies(1): >>42187384 #

19. fwip ◴[19 Nov 24 19:51 UTC] No.42187384{5}[source]▶

>>42186830 #

Oh, my bad! Thank you for the correction, and for all your work making hyperfine.