Inverting the Xorshift128 random number generator

1. Retr0id ◴[04 Sep 25 13:27 UTC] No.45127046[source]▶

If you represent the state as a 128-long vector of GF(2) elements, you can model the state transition function as a matrix multiplication.

This allows you to describe any output bit (at any offset in the output stream) as a function of the 128 initial state elements.

Treating the initial state vector as 128 unknown variables, you can solve for them (via gaussian elimination) with any 128 bits from the output stream, so long as you know their offsets in the output sequence.

Although, depending on what those offsets are there's a ~50% chance there's no unique solution, in which case you may need a few more bits. For the case of two consecutive 64-bit outputs, you get a unique solution.

For 128 samples, the gaussian elimination can be done in 2^21 steps, or only 2^14 steps if you can put a whole bit-vector in one register (which you can!)

If the offsets are known in advance, you can pre-compute the gaussian elimination work and end up with a constant matrix that, if multiplied by the leaked bits, obtains the initial state vector.

replies(2): >>45127136 #>>45132113 #

2. tptacek ◴[04 Sep 25 13:36 UTC] No.45127136[source]▶

>>45127046 (TP) #

If you follow the ChatGPT log, that's the angle he takes as well.

replies(2): >>45127400 #>>45130987 #

3. Retr0id ◴[04 Sep 25 14:03 UTC] No.45127400[source]▶

>>45127136 #

Oh that's fascinating, I didn't get that impression at all from the article itself.

replies(1): >>45147118 #

4. viraNtor ◴[04 Sep 25 19:03 UTC] No.45130987[source]▶

>>45127136 #

Which ChatGPT version was used? 4o? 5? It's clear from the log it suffers from serious context problems, keeps forgetting/hallucinating functions and forgetting arguments to functions is has already written.

5. m1el ◴[04 Sep 25 20:58 UTC] No.45132113[source]▶

>>45127046 (TP) #

Exactly. I've implemented a xorshift-based rng inverter previously, and here's the implementation for the algorithm in the article:

https://github.com/m1el/samaras/blob/master/src/xorshift128....

replies(2): >>45132274 #>>45147051 #

6. Retr0id ◴[04 Sep 25 21:14 UTC] No.45132274[source]▶

>>45132113 #

Seeing your comment with the precomputed matrix reminded me of this visualisation I made: https://bsky.app/profile/retr0.id/post/3kc2rz7i7fm2y

7. ScottContini ◴[06 Sep 25 06:18 UTC] No.45147051[source]▶

>>45132113 #

The blog is not about going from one known internal state to the previous internal state.

It is about not knowing the internal state but figuring it out after having 2 (or 3) consecutive outputs of the random number generator.

8. ScottContini ◴[06 Sep 25 06:32 UTC] No.45147118{3}[source]▶

>>45127400 #

If you look at the chatgpt conversation which is linked at the bottom of the article (it will take a minute to load because it is very long), you will see that my initial approach was using matrices over GF(2). I am in the process of writing a blog about how I thought I could solve this quickly using ChatGPT, but unfortunately it took me down a bunch of rabbit holes that didn't work. Ultimately the ideas that did work were my own, but I still think using ChatGPT is useful if you know how to use it right (keep it on a tight leash, more details coming soon).

Back to the matrices... I thought I have an idea using matrices that would solve it really fast. It didn't work. There were dependencies in the matrix that I did not expect that made the idea fail. You can see all the details in the chat log.

With your recommendation, it is not clear to me that you are dealing with the problem that there is an integer addition of the state values to get the outputs. This integer addition ruins the direct usage of linear algebra to solve the problem. The initial approach that ChatGPT suggested, which is its own idea (not mine), was that we could bring in the integer carry bits as unknown variables and derive them via induction. I could come back to that idea, but I am sceptical that will work.

Eventually I got fed up with ChatGPT's implementation and it trying to help debug, which was only slowing me down. It kept trying to change the underlying data structures during the debugging. I had to scold it about that: you cannot change things when you are debugging it.

Ultimately, I did away with the linear algebra and just approached it from some simple observations about how bits affect other bits. It is linear algebra under the hood, but I did not use matrices at this point.

One of the reasons why I wrote the blog is to encourage more people to look at this. We should be able to invert this in 2 or 3 outputs, whereas Z3 requires 5. Nothing I am doing is complex. I would love to see someone produce a tool that breaks it really quickly. And I really mean a tool, not just a statement on why it is weak. We all know it is weak, but web security researchers need tools to use that demonstrate exploits based upon primitives that are wrongly used. So I would strongly encourage people with ideas to implement them and blog about the ones that work.

I have another idea on the plate that I think might bring the search down to 2^20 effort, but need time to work out details. Free time is the hardest thing in life for me to find: I just have a busy life. But I also have a passion for building fast tools and puzzle solving.

replies(1): >>45151313 #

9. Retr0id ◴[06 Sep 25 17:40 UTC] No.45151313{4}[source]▶

>>45147118 #

Ah, yes, the addition throws a spanner in the works a little. 3 successive Math.random() outputs should definitely be solvable with a 2^32 brute force (~seconds on CPU). I'll try to understand your 2^26 approach a little better though.