A webshell and a normal file that have the same MD5

(github.com)

Show context

magicalhippo ◴[24 Sep 25 08:14 UTC] No.45357640[source]▶

>>45320382 (OP) #

Not only is MD5 broken as shown here, if you have a modern CPU it's also quite slow compared to good, non-broken alternatives. See for example this comparison[1] (post says JavaScript but it's actually OpenSSL's implementation that's actually tested).

[1]: https://lemire.me/blog/2025/01/11/javascript-hashing-speed-c...

replies(1): >>45358261 #

gruez ◴[24 Sep 25 10:02 UTC] No.45358261[source]▶

>>45357640 #

I only see new CPUs benchmarked, maybe that's because newer CPUs have SHA acceleration extensions? I'd expect SHA256 to be more complex and therefore be more computationally expensive.

replies(2): >>45358603 #>>45358721 #

1. sltkr ◴[24 Sep 25 10:48 UTC] No.45358603[source]▶

>>45358261 #

Yes, SHA256 is faster than MD5 only if you have hardware accelleration. But SHA256 itself is pretty slow compared to the state of the art. For example, BLAKE3 is just as secure as SHA256 but an order of magnitude faster.

Try this on your own system:

    $ head -c 1000000000 /dev/urandom > random-1gb
    
    $ time md5sum random-1gb 
    ef72a3616aad5117ddf40a7d5f5d0162  random-1gb
    
    real 0m2.428s
    user 0m2.192s
    sys 0m0.202s
    
    $ time sha256sum random-1gb 
    ec7d7f31c4489acae8328fddbe54157f1cb9e97b220ef502a07e1f9230969310  random-1gb
    
    real 0m3.894s
    user 0m3.697s
    sys 0m0.181s
    
    $ time b3sum random-1gb 
    11fe11cc5721faf65369d18893d7b7631f6178b4692bc0bb03b1b180273cd384  random-1gb
    
    real 0m0.282s !!!
    user 0m0.876s
    sys 0m0.124s
    
    $ time b3sum --num-threads=1 random-1gb 
    11fe11cc5721faf65369d18893d7b7631f6178b4692bc0bb03b1b180273cd384  random-1gb
    
    real 0m0.597s
    user 0m0.488s
    sys 0m0.107s

This is on an old Chromebook with Intel(R) Core(TM) m3-6Y30 CPU @ 0.90GHz CPU (dual core, but with hyperthreading). Note that even using only a single thread (which SHA256 and MD5 are limited to by their design), BLAKE3 is 6x as fast as SHA256 and 4x as fast as MD5.

replies(2): >>45358755 #>>45370472 #

2. adrian_b ◴[24 Sep 25 11:12 UTC] No.45358755[source]▶

>>45358603 (TP) #

Unlike SHA-256, BLAKE3 can be evaluated in parallel, so the speedup factor over SHA-256 depends on the number of available CPU cores.

While BLAKE3 can be many times faster than SHA-256, by consuming many times more power, the amount of work for computing a hash differs much less between the 2 hashes than the execution time on a multi-core CPU.

The speed difference quoted by you for a single thread is caused by your Skylake-based CPU, which does not have the SHA hardware instructions.

Moreover, even the programs that claim to use the SHA hardware instructions may have a speed several times lower than allowed by the hardware, because the more recent CPUs, e.g. from the last 4 years, have wider SHA instructions than the older CPUs, but the programs must have been compiled to support such CPUs, e.g. Zen 3 and newer or Alder Lake and newer.

replies(1): >>45359163 #

3. amelius ◴[24 Sep 25 12:04 UTC] No.45359163[source]▶

>>45358755 #

This makes me wonder how much security suffers if you split a file in N smaller files, compute a hash for each of them, then hash the concatenation of the hashes.

replies(2): >>45359821 #>>45360276 #

4. adrian_b ◴[24 Sep 25 13:10 UTC] No.45359821{3}[source]▶

>>45359163 #

BLAKE3 and other parallelizable hashes do exactly this, but using a somewhat more complex algorithm, which ensures that the result is a secure hash.

Such an algorithm has been first published by Ralph Merkle, in 1979, but it has been improved later:

https://en.wikipedia.org/wiki/Merkle_tree

For security, it is necessary to use different hash functions at different levels in the hash tree, but this is trivially achieved by using the same hash function, but also hashing some extra distinguishing data besides the hashes from the previous level.

5. oconnor663 ◴[24 Sep 25 13:43 UTC] No.45360276{3}[source]▶

>>45359163 #

It's "easy" to do it right but also very common to do it wrong: https://jacko.io/tree_hashing.html

6. edgineer ◴[25 Sep 25 08:20 UTC] No.45370472[source]▶

>>45358603 (TP) #

>BLAKE3 is just as secure as SHA256 but an order of magnitude faster

Is this not an oxymoron? E.g. b3 then ought to be an order of magnitude easier to brute force.

replies(2): >>45371860 #>>45379318 #

7. sltkr ◴[25 Sep 25 12:07 UTC] No.45371860[source]▶

>>45370472 #

I'm talking about theoretic security, i.e. number of operations needed to perform certain attacks.

For a 256-bit cryptographic hash function, it should take an expected 2^256 attempts to find a message with a given hash (preimage attack) and around 2^128 attempts to find any collision (due to the birthday paradox), and a few other properties like that. This holds for both SHA-256 and Blake3 (as far as we know—neither algorithm has proven security*) but not for MD5.

MD5 is insecure not just because its output size of 128 bit is too short (though that's a problem too), but also because it has weaknesses that allow constructing collisions with much less than the 2^64 attempts than you would expect on the basis of its output size. That's why MD5 is considered insecure even for its size.

Generally speaking, you want your hashing primitives to be as fast as possible. The practical security then comes from the output size. If someone discovered a secure 320-bit cryptographic hash that is a trillion times faster than even Blake3 (10^12 or about 2^40), everyone should adopt it, because it would be much faster and even more secure against brute force attacks than SHA-256/Blake3 are (since 320 > 256 + 40).

While there are use cases for deliberately slow hash functions too (notably password hashing) those can be constructed using fast hash functions as primitives. For example, one of the strongest password hashing schemes (Argon2) is based on one of the fastest hashing primitives (Blake2), not a slow one as you might have expected.

8. oconnor663 ◴[25 Sep 25 21:24 UTC] No.45379318[source]▶

>>45370472 #

This is a common misconception, based on the difference between password hashing and other general uses for a cryptographic hash. Password hashing is special, because we want to protect people who pick terrible passwords, so we need guess-and-check to be expensive. But for most other use cases, like say HMAC or signing, the number of possible inputs is so astronomically large that guess-and-check would be impossible even if each guess was e.g. just a single add instruction. This distinction is why we say never to use a general purpose hash with passwords.

↑