Most active commenters
  • kstrauser(3)
  • tptacek(3)
  • Dylan16807(3)

←back to thread

Inverting the Xorshift128 random number generator

(littlemaninmyhead.wordpress.com)
108 points rurban | 24 comments | | HN request time: 1.205s | source | bottom
1. Aardwolf ◴[] No.45127470[source]
Xorshift128+ is not a cryptographic rng though, so at least this isn't a cryptographic attack...

Should programming languages use cryptographic rngs like a ChaCha20 based one in their standard libraries to stop accidental use of non cryptographic rngs for cryptographic purposes? But that comes at the cost of speed

replies(6): >>45127744 #>>45127837 #>>45127961 #>>45127992 #>>45131766 #>>45131852 #
2. kstrauser ◴[] No.45127744[source]
I think some naming conventions could go a long way. If you want to import `fast_unsafe_random`, you might think twice.
replies(2): >>45127995 #>>45135027 #
3. chaboud ◴[] No.45127837[source]
Perhaps put a warning in the name since the folks who don’t read the docs are the ones you’re trying to protect?

For example: Math.RandomNotCrypto()

When someone uses that in production for cryptographic purposes (and, yes someone is going to do that), they have to wear a dunce cap to the office for a month.

replies(2): >>45131738 #>>45132416 #
4. kbolino ◴[] No.45127961[source]
Go has taken exactly this stance, at least since Go 1.22 from 18 months ago.

They settled on 8 rounds of ChaCha with 300 bytes of internal state to amortize the latency. The end result is something only 1.5-2x slower than (their particular flavor of) PCG [1]. It was deemed good enough to become the default.

[1]: https://go.dev/blog/chacha8rand#performance

5. tptacek ◴[] No.45127992[source]
It is a cryptographic attack, just against an algorithm that isn't cryptographically strong. There's a lot of value in those.
6. thomasmg ◴[] No.45127995[source]
I agree, why would you slow down things for everybody if it's only a problem for cryptographic purposes. Xorshift128+ etc are around 10 to 30 times faster than ChaCha20.

The challenge is things that don't _obviously_ need cryptographically secure generators. For example, do you need a secure generator for the seed of a hash table, or a sorting algorithm? (For those that do use a seed). Some will argue that yes, this is important. Until a few years ago, the hash tables used static hash algorithms without any randomization, but "hash flooding" changed that. I think that nowadays, still many hash table implementations don't use secure generators.

Then, there's secure and insecure hash functions. Secure hash functions like SHA-256 are (compared to non-secure functions) specially slow for short keys. There are "somewhat" secure hash function algorithms like SipHash that can be used for this purpose.

replies(4): >>45128074 #>>45128629 #>>45128841 #>>45132983 #
7. tptacek ◴[] No.45128074{3}[source]
I think people overthink this, and we should just have a library standard full-strength CSPRNG, and then people with fussy fast-randomness needs (Monte Carlo or whatever) can just pull in libraries. "Don't need secure random and can't use it" is a very niche problem space.

It'll never happen in Go though! They're over a decade committed to this library shape.

replies(1): >>45128549 #
8. kstrauser ◴[] No.45128549{4}[source]
I strongly agree here. The default should be strong, “slow” randomness. If you know you need something different in your specific use case, and just can’t abide the safe version, import something else.
replies(1): >>45130340 #
9. kstrauser ◴[] No.45128629{3}[source]
Good point about the hashing. Python does the right thing by making you select the one you want when writing your own code. If it had a default option, make that SHA-256 so that all users get the strong one by default. But yes, if you’re not actually doing crypto stuff, say if you only want to see if two locally generated files have the same content, there are lots of much faster alternatives.
10. Dylan16807 ◴[] No.45128841{3}[source]
> why would you slow down things for everybody

Secure generators are still fast and the number of programs where you'd see a difference is very small.

11. caspper69 ◴[] No.45130340{5}[source]
I agree. .NET is the opposite of Go. Calls to System.Random use Xoshiro128++ under the hood (as of .NET 6 I believe). On the other hand, calls to RandomNumberGenerator.GetBytes() are cryptographically secure, using the Windows kernel cryptographic provider on Windows and /dev/urandom (chacha20) on Linux and arc4random_buf() on MacOS (which also uses chacha20 under the hood).

I ported around 20 RNGs to C# (all non-cs), and there are tons of uses for non-cryptographic RNGs, so I'm a little torn. I guess in modern development most people who need an RNG need it for crypto purposes (I would guess salts, keys and nonces mostly), but I'd hate to see all the Xoshiros, Mersenne Twisters, PCGs, and MWCs, etc. go the way of the dodo simply because they are not deemed fit for crypto purposes. Games, simulations, non-cryptographic hashes all need deterministic and high performance RNGs, and don't need all of the cryptographic guarantees.

To top it off, there is no standard definition of what makes an RNG cryptographically secure, so it's a slightly loaded question anyway. Everything I've read says an algo needs the following properties: forward secrecy (unable to guess future outputs given the current state), backward secrecy (if I know current outputs, I shouldn't be able to recover previous internal state or previous outputs), and the output must be indistinguishable from true random bits, even with a chosen-input attack. This is where I politely defer to the expert mathematicians and cryptographers, because I'm not equipped to perform such an analysis.

I can understand why things have developed this way though- people have needed random numbers far longer than they've needed cryptographically secure random numbers, so the default is the non-cryptographically secure variant. A language created tomorrow would likely follow in Go's footsteps and default to the cryptographically secure.

replies(2): >>45130580 #>>45132911 #
12. tptacek ◴[] No.45130580{6}[source]
No, CSPRNG vs. RNG isn't a loaded question. Every RNG that says "this isn't an according-to-Hoyle cryptographically random number generator, but..." isn't one. Most modern CSPRNGs are designed with well-understood cryptographic primitives, so they draft off those security properties. Establishing those properties for a novel set of primitives is a major undertaking.

It's a little frustrating, because there are definitely fast RNGs that have tried to blur this line. A reasonable first approximation of the current situation is that a CSPRNG should have somewhere in its core a mixing function based on an actual cryptographic hash or permutation function; if the design has to explain what that function is and how it works (as opposed to just saying "this is ChaCha20"), it's not secure. These fast RNGs, like Xorshiro and PCG, all get there by not having cryptographically strong mixing functions at their core.

For what it's worth, I think the "GetBytes() means secure, IntN() means it's not secure" is a clever misfeature. Just make all the standard library random interfaces back onto a real CSPRNG, and let people pull in PCG or whatever if they have specialized needs for fast insecure RNGs.

replies(1): >>45131832 #
13. layer8 ◴[] No.45131738[source]
People are likely to use it in security-relevant ways without being aware that the use case constitutes “crypto”.
replies(1): >>45133209 #
14. kevin_thibedeau ◴[] No.45131766[source]
No. Sometimes you want a reproducible PRNG sequence. These efforts to harden generators break the existing specifications with a rug pull.
15. wahern ◴[] No.45131832{7}[source]
> Just make all the standard library random interfaces back onto a real CSPRNG

That's what OpenBSD has done for the traditional C and POSIX randomness APIs.

Also, re your earlier comment, OpenBSD's arc4random API is everywhere now except linux/musl and Windows. POSIX now has getentropy, which on recent Linux kernel will be as fast as arc4random_buf. But it would be nice if musl got the arc4random API, which includes arc4random_uniform for generating 32-bit numbers in the interval [0, N), minimizing the risk of people screwing that up.

Unlikely vendors will takes OpenBSD's approach to the historic PRNG APIs, but they're almost all there for the arc4random API. Also, the former approach is less ideal than it sounds; the latest versions of PUC Lua, for example, use an included non-CSPRNG rather than merely binding the historic C APIs. Explicitly using the arc4random API means the semantics are explicit, too, and you can more easily audit code. It's conspicuously missing an API for floating point intervals, but perhaps that'll come along.

16. hannob ◴[] No.45131852[source]
> But that comes at the cost of speed

That is mostly a myth.

I mean... technically, yes. But the cost is so marginal that you will have a hard time even measuring it unless you generate gigabytes of data.

For pretty much all common use cases like generation of ids, tokens, etc., you can use a secure random number generator and it will not impact your performance in any meaningful way.

replies(1): >>45136240 #
17. Retr0id ◴[] No.45132416[source]
Math.random is a web API so you can't just rename it without breaking a large chunk of the web.

A non-breaking change would be to upgrade Math.random to be cryptographically secure - these days we know how to do this with minimal performance impact.

replies(1): >>45151508 #
18. Dylan16807 ◴[] No.45132911{6}[source]
What's your threshold for "high performance"? A modern CPU can use a secure algorithm and produce more than one byte per cycle. Xorshift is a bit faster but not much faster.
19. Dylan16807 ◴[] No.45132983{3}[source]
> Xorshift128+ etc are around 10 to 30 times faster than ChaCha20.

What methods, what CPU? Is that using chacha20 a couple bytes at a time? If you generate your random bytes in medium size blocks you'll probably see a much smaller difference.

20. degamad ◴[] No.45133209{3}[source]
Exactly - I'm just generating random session ids, I'm not encrypting anything (or using any bitcoins). There's no crypto here, right?
replies(1): >>45138588 #
21. cozzyd ◴[] No.45135027[source]
funsafe_random
22. tialaramex ◴[] No.45136240[source]
It's also the exact same silly argument as for the memory unsafety.

Incorrect isn't faster, it's just wrong, I can have wrong instantly and you're not faster than that, or smaller, or cheaper, or easier to understand. So you're just much worse.

23. cratermoon ◴[] No.45138588{4}[source]
Anakin Padme 4 Panel "right?" meme.
24. chaboud ◴[] No.45151508{3}[source]
This is a “next time” recommendation. Short of a time machine, we can’t change published names.

And, yes, I’d be down with going cryptographically secure (for now) with existing systems.