This allows you to describe any output bit (at any offset in the output stream) as a function of the 128 initial state elements.
Treating the initial state vector as 128 unknown variables, you can solve for them (via gaussian elimination) with any 128 bits from the output stream, so long as you know their offsets in the output sequence.
Although, depending on what those offsets are there's a ~50% chance there's no unique solution, in which case you may need a few more bits. For the case of two consecutive 64-bit outputs, you get a unique solution.
For 128 samples, the gaussian elimination can be done in 2^21 steps, or only 2^14 steps if you can put a whole bit-vector in one register (which you can!)
If the offsets are known in advance, you can pre-compute the gaussian elimination work and end up with a constant matrix that, if multiplied by the leaked bits, obtains the initial state vector.