←back to thread

173 points daviducolo | 10 comments | | HN request time: 1.519s | source | bottom
1. geocar ◴[] No.43335277[source]
Hi David.

    $ (for x in `seq 1 100000`; do echo 'I am a Test Vector HeLlO World '"$x"; done) > /dev/shm/krep_tmp
Best of three runs shown:

    $ time ./krep -i hello /dev/shm/krep_tmp
    Found 43721 matches
    Search completed in 0.0017 seconds (2017.44 MB/s)
    Search details:
      - File size: 3.52 MB
      - Pattern length: 5 characters
      - Using AVX2 acceleration
      - Case-insensitive search
    real        0m0,005s
    user        0m0,001s
    sys         0m0,004s
    $ time ./krep HeLlO /dev/shm/krep_tmp
    Found 82355 matches
    Search completed in 0.0014 seconds (1259.72 MB/s)
    Search details:
      - File size: 1.71 MB
      - Pattern length: 5 characters
      - Using AVX2 acceleration
      - Case-sensitive search
    real        0m0,004s
    user        0m0,003s
    sys         0m0,004s
    $ time ./krep -i "HeLlO World" /dev/shm/krep_tmp
    Found 99958 matches
    Search completed in 0.0021 seconds (1700.54 MB/s)
    Search details:
      - File size: 3.52 MB
      - Pattern length: 11 characters
      - Using AVX2 acceleration
      - Case-insensitive search
    real        0m0,005s
    user        0m0,002s
    sys         0m0,004s
    $ time ./krep "I am a Test Vector HeLlO World" /dev/shm/krep_tmp
    Found 3964 matches
    Search completed in 0.0149 seconds (235.83 MB/s)
    Search details:
      - File size: 3.52 MB
      - Pattern length: 30 characters
      - Using AVX2 acceleration
      - Case-sensitive search
    real        0m0,016s
    user        0m0,015s
    sys         0m0,001s
    $ time ./krep -i "I am a Test Vector hello World" /dev/shm/krep_tmp
    Found 3964 matches
    Search completed in 0.0178 seconds (197.70 MB/s)
    Search details:
      - File size: 3.52 MB
      - Pattern length: 30 characters
      - Using AVX2 acceleration
      - Case-insensitive search
    real        0m0,021s
    user        0m0,017s
    sys         0m0,004s
Benchmark with fgrep (the first run was good enough):

    $ time fgrep -ci hello /dev/shm/krep_tmp
    100000
    real        0m0,003s
    user        0m0,003s
    sys         0m0,000s
    $ time fgrep -ci "I am a Test Vector hello World" /dev/shm/krep_tmp
    100000
    real 0m0,010s
    user 0m0,009s
    sys         0m0,000s
    $ time fgrep -c "I am a Test Vector HeLlO World" /dev/shm/krep_tmp
    100000
    real 0m0,005s
    user 0m0,004s
    sys         0m0,001s
This is a model name: Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz. There's 40gb of ram free and 10 cores doing nothing. shell is cpuset. On commit 95ed1853b561396c8a8bcbbdd115ed6273848e3f (HEAD -> main, origin/main, origin/HEAD). gcc is 13.3.0-6ubuntu2~24.04

tl;dr: krep produces obviously wrong results slower than fgrep.

replies(2): >>43335306 #>>43340892 #
2. burntsushi ◴[] No.43335306[source]
Consider using a bigger haystack. Your timings are so short that you're mostly just measuring the overhead of running a process.

This is relevant to krep because it spawns threads to search files (I guess for files over 1MB?).

This does not mean your benchmark is worthless. It just means you can't straight-forwardly generalize from it.

replies(3): >>43335857 #>>43336268 #>>43340461 #
3. globnomulous ◴[] No.43335857[source]
That's a good point, though the readme does flatly state that krep "is designed with performance as a primary goal," so the lede's generalization that it is "blazingly fast" isn't correct, despite the later, more deeply buried caveat that "Performance may vary based on hardware, file characteristics, and search pattern" (which describes all software). And the comment you answered doesn't say just that krep is "slower" than fgrep; it says krep "produces obviously wrong results" slower.

Edit: and the fact that krep lacks regular-expression support means it's not a replacement for grep or meaningfully comparable with it.

replies(1): >>43335973 #
4. burntsushi ◴[] No.43335973{3}[source]
I try my best to interpret pithy phrases describing a project as first order approximations, rather than literal statements of truth that perfectly generalize. Pithiness is important for communicating ideas quickly, but precision and pithiness are often in tension with one another. So I adjust my expectations accordingly.

Yes, I agree that the wrong results are bad. But that doesn't invalidate my point. I even went out of my way to clarify that the benchmark wasn't worthless. Benchmarking the small input case is absolutely worth it. You just can't tell much about its scaling properties when your measurement is basically "how fast does the process start and stop." Which, again, to be clear, IT MATTERS. It just probably doesn't matter as much as readers might think it matters when they see it.

So treat my comment as adding helpful context for readers that aren't experts in benchmarking grep tools from someone experienced in... benchmarking grep tools. :-) (And regexes in general. See: https://github.com/BurntSushi/rebar)

5. fanf2 ◴[] No.43336268[source]
The incorrect results are far more important than the times!
replies(1): >>43337371 #
6. burntsushi ◴[] No.43337371{3}[source]
I agree.
7. gigatexal ◴[] No.43340507{3}[source]
Give the author some grace. Sheesh. You sound like a toxic senior dev

What about opening an issue with the incorrect results

replies(1): >>43340806 #
8. geocar ◴[] No.43340806{4}[source]
> What about opening an issue with the incorrect results

I don't work for you.

> You sound like a toxic senior dev

I'm probably much nicer to the people I work with, but that's because an enormous amount of money is involved.

You want to have that kind of a conversation, I'll consider letting you tell me what to do, but you need to understand the service I'm providing for free doesn't come with that.

People don't like being told what to do. They are probably going to be in a bad mood if you waste their time with a bold claim that is so obviously false. You are lucky to have someone tell you that the software doesn't work, because you obviously wouldn't know otherwise.

Best of luck.

replies(1): >>43359692 #
9. ◴[] No.43340892[source]
10. gigatexal ◴[] No.43359692{5}[source]
Oh well it’s nice to know that if we work together you’ll be nice since you’re being paid to be.

But do you treat others outside of work like this? What if your partner makes a mistake? Your kid thinks they’ve done something awesome but have a huge flaw in whatever it is?

> They are probably going to be in a bad mood if you waste their time with a bold claim that is so obviously false.

Did you review this on company time? On your own time? How is your time being wasted giving feedback to another developer? You’re improving the community of developers. You don’t see the benefit of that?

I wonder how you tolerate failure? Weakness? In others. I don’t think well but what do I care? right? I’m not in the business of helping anyone better themselves or anything ;-)