Popular/hot comments

(davidesantangelo.github.io)

Show context

daviducolo ◴[11 Mar 25 16:16 UTC] No.43333994[source]▶

>>43333946 (OP) #

You can read my blog post about the project at https://dev.to/daviducolo/introducing-krep-building-a-high-p...

replies(4): >>43335277 #>>43335647 #>>43337748 #>>43339241 #

1. geocar ◴[11 Mar 25 17:59 UTC] No.43335277[source]▶

>>43333994 #

Hi David.

    $ (for x in `seq 1 100000`; do echo 'I am a Test Vector HeLlO World '"$x"; done) > /dev/shm/krep_tmp

Best of three runs shown:

    $ time ./krep -i hello /dev/shm/krep_tmp
    Found 43721 matches
    Search completed in 0.0017 seconds (2017.44 MB/s)
    Search details:
      - File size: 3.52 MB
      - Pattern length: 5 characters
      - Using AVX2 acceleration
      - Case-insensitive search
    real        0m0,005s
    user        0m0,001s
    sys         0m0,004s
    $ time ./krep HeLlO /dev/shm/krep_tmp
    Found 82355 matches
    Search completed in 0.0014 seconds (1259.72 MB/s)
    Search details:
      - File size: 1.71 MB
      - Pattern length: 5 characters
      - Using AVX2 acceleration
      - Case-sensitive search
    real        0m0,004s
    user        0m0,003s
    sys         0m0,004s
    $ time ./krep -i "HeLlO World" /dev/shm/krep_tmp
    Found 99958 matches
    Search completed in 0.0021 seconds (1700.54 MB/s)
    Search details:
      - File size: 3.52 MB
      - Pattern length: 11 characters
      - Using AVX2 acceleration
      - Case-insensitive search
    real        0m0,005s
    user        0m0,002s
    sys         0m0,004s
    $ time ./krep "I am a Test Vector HeLlO World" /dev/shm/krep_tmp
    Found 3964 matches
    Search completed in 0.0149 seconds (235.83 MB/s)
    Search details:
      - File size: 3.52 MB
      - Pattern length: 30 characters
      - Using AVX2 acceleration
      - Case-sensitive search
    real        0m0,016s
    user        0m0,015s
    sys         0m0,001s
    $ time ./krep -i "I am a Test Vector hello World" /dev/shm/krep_tmp
    Found 3964 matches
    Search completed in 0.0178 seconds (197.70 MB/s)
    Search details:
      - File size: 3.52 MB
      - Pattern length: 30 characters
      - Using AVX2 acceleration
      - Case-insensitive search
    real        0m0,021s
    user        0m0,017s
    sys         0m0,004s

Benchmark with fgrep (the first run was good enough):

    $ time fgrep -ci hello /dev/shm/krep_tmp
    100000
    real        0m0,003s
    user        0m0,003s
    sys         0m0,000s
    $ time fgrep -ci "I am a Test Vector hello World" /dev/shm/krep_tmp
    100000
    real 0m0,010s
    user 0m0,009s
    sys         0m0,000s
    $ time fgrep -c "I am a Test Vector HeLlO World" /dev/shm/krep_tmp
    100000
    real 0m0,005s
    user 0m0,004s
    sys         0m0,001s

This is a model name: Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz. There's 40gb of ram free and 10 cores doing nothing. shell is cpuset. On commit 95ed1853b561396c8a8bcbbdd115ed6273848e3f (HEAD -> main, origin/main, origin/HEAD). gcc is 13.3.0-6ubuntu2~24.04

tl;dr: krep produces obviously wrong results slower than fgrep.

replies(2): >>43335306 #>>43340892 #

2. burntsushi ◴[11 Mar 25 18:03 UTC] No.43335306[source]▶

>>43335277 (TP) #

Consider using a bigger haystack. Your timings are so short that you're mostly just measuring the overhead of running a process.

This is relevant to krep because it spawns threads to search files (I guess for files over 1MB?).

This does not mean your benchmark is worthless. It just means you can't straight-forwardly generalize from it.

replies(3): >>43335857 #>>43336268 #>>43340461 #

3. globnomulous ◴[11 Mar 25 18:58 UTC] No.43335857[source]▶

>>43335306 #

That's a good point, though the readme does flatly state that krep "is designed with performance as a primary goal," so the lede's generalization that it is "blazingly fast" isn't correct, despite the later, more deeply buried caveat that "Performance may vary based on hardware, file characteristics, and search pattern" (which describes all software). And the comment you answered doesn't say just that krep is "slower" than fgrep; it says krep "produces obviously wrong results" slower.

Edit: and the fact that krep lacks regular-expression support means it's not a replacement for grep or meaningfully comparable with it.

replies(1): >>43335973 #

4. burntsushi ◴[11 Mar 25 19:09 UTC] No.43335973{3}[source]▶

>>43335857 #

I try my best to interpret pithy phrases describing a project as first order approximations, rather than literal statements of truth that perfectly generalize. Pithiness is important for communicating ideas quickly, but precision and pithiness are often in tension with one another. So I adjust my expectations accordingly.

Yes, I agree that the wrong results are bad. But that doesn't invalidate my point. I even went out of my way to clarify that the benchmark wasn't worthless. Benchmarking the small input case is absolutely worth it. You just can't tell much about its scaling properties when your measurement is basically "how fast does the process start and stop." Which, again, to be clear, IT MATTERS. It just probably doesn't matter as much as readers might think it matters when they see it.

So treat my comment as adding helpful context for readers that aren't experts in benchmarking grep tools from someone experienced in... benchmarking grep tools. :-) (And regexes in general. See: https://github.com/BurntSushi/rebar)

5. fanf2 ◴[11 Mar 25 19:41 UTC] No.43336268[source]▶

>>43335306 #

The incorrect results are far more important than the times!

replies(1): >>43337371 #

6. burntsushi ◴[11 Mar 25 21:43 UTC] No.43337371{3}[source]▶

>>43336268 #

I agree.

7. gigatexal ◴[12 Mar 25 06:43 UTC] No.43340507{3}[source]▶

>>43340461 #

Give the author some grace. Sheesh. You sound like a toxic senior dev

What about opening an issue with the incorrect results

replies(1): >>43340806 #

8. geocar ◴[12 Mar 25 07:53 UTC] No.43340806{4}[source]▶

>>43340507 #

> What about opening an issue with the incorrect results

I don't work for you.

> You sound like a toxic senior dev

I'm probably much nicer to the people I work with, but that's because an enormous amount of money is involved.

You want to have that kind of a conversation, I'll consider letting you tell me what to do, but you need to understand the service I'm providing for free doesn't come with that.

People don't like being told what to do. They are probably going to be in a bad mood if you waste their time with a bold claim that is so obviously false. You are lucky to have someone tell you that the software doesn't work, because you obviously wouldn't know otherwise.

Best of luck.

replies(1): >>43359692 #

9. ◴[12 Mar 25 08:13 UTC] No.43340892[source]▶

>>43335277 (TP) #

10. gigatexal ◴[14 Mar 25 04:53 UTC] No.43359692{5}[source]▶

>>43340806 #

Oh well it’s nice to know that if we work together you’ll be nice since you’re being paid to be.

But do you treat others outside of work like this? What if your partner makes a mistake? Your kid thinks they’ve done something awesome but have a huge flaw in whatever it is?

> They are probably going to be in a bad mood if you waste their time with a bold claim that is so obviously false.

Did you review this on company time? On your own time? How is your time being wasted giving feedback to another developer? You’re improving the community of developers. You don’t see the benefit of that?

I wonder how you tolerate failure? Weakness? In others. I don’t think well but what do I care? right? I’m not in the business of helping anyone better themselves or anything ;-)

↑

Show HN: Krep a High-Performance String Search Utility Written in C