Most active commenters
  • pmarreck(5)
  • kerkeslager(5)

←back to thread

97 points indigodaddy | 20 comments | | HN request time: 0.212s | source | bottom
1. pmarreck ◴[] No.45155251[source]
Why not just simulate a real shuffle?

Just "cut" in a random location (rotate the deck a random amount), then split the deck roughly in half (add a bit of random variation), then "flip them together" back to front by alternately taking 1 or 2 (randomly, add a small chance of 3, so maybe 50% 1, 40% 2 and 10% 3) from each side till there are no cards left to shuffle. Then repeat 8 times or so (there's a certain minimum number of times that ensures good randomness)

replies(3): >>45155382 #>>45155506 #>>45156321 #
2. kerkeslager ◴[] No.45155382[source]
> Why not just simulate a real shuffle?

If you are asking why your proposed shuffling method is insecure: I don't know, and that's why I would never use that.

Asking "why not do X?" is entirely not paranoid enough for security. If you want to propose an algorithm, start with trying to prove the security claims of the algorithm. In this case, you'd need to prove that your algorithm creates a permutation that is indistinguishable from random. If you can't prove it, it's highly probable that your algorithm doesn't create a random permutation and someone will figure out how to break it.

I'll point out that we already have proven shuffling algorithms[1] and they're obviously faster than what you've proposed. So the simple answer to your question is "Because it's unproven and slower than proven algorithms."

[1] https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle

replies(1): >>45155421 #
3. pmarreck ◴[] No.45155421[source]
From what I understand, the quality of the randomness of Fisher-Yates depends entirely on the quality of the random source AND that you didn't bias it by using modulo with a non-evenly-dividing divisor. It actually says that right in the article.

My method may not suffer as much from those drawbacks, but you're right, without testing it thoroughly, there's no way to know and it should not be relied upon instead of F-Y.

EDIT: My intuition was correct more or less. Doing it the way I described serves to "smear" any bias across the deck. Fascinating chatgpt convo on it: https://chatgpt.com/share/68bd103f-9188-8004-8cbc-86693a0d87...

Turns out that an even easier way is just to assign 128 bits of randomness to each card and sort them by that. Degrades gracefully from biased sources, apparently.

replies(3): >>45155555 #>>45156002 #>>45207546 #
4. amluto ◴[] No.45155506[source]
You can:

(a) Use Fischer-Yates, which is fast and, if correctly implemented, unbiased, or

(b) Come up with an ad-hoc scheme like this, do some fancy math to bound its total variation distance (or, worse, some weaker concept of distance), convince yourself that the distance to the uniform distribution is acceptable, and end up with a slower algorithm when you’re done.

5. kerkeslager ◴[] No.45155555{3}[source]
> From what I understand, the quality of the randomness of Fisher-Yates depends entirely on the quality of the random source AND that you didn't bias it by using modulo with a non-evenly-dividing divisor. It actually says that right in the article.

Yes.

Pretty much every modern language ships with a secure PRNG (which probably just calls /dev/urandom). A poker site probably has enough throughput to want to not block while waiting for /dev/urandom to build up entropy, so they might do something faster, but /dev/urandom is probably secure, it just might be a slower than a big poker site needs.

The non-evenly-diving divisor thing is a bit trickier, which is why standard libraries implement Fisher-Yates for you. But the solution is basically:

Using your PRNG, generate the number of bits you need. So if you need a number 0-51, generate 6 bits. 6 bits can hold 0-63. If you get a number in the range 52-63, discard that and generate a new 6-bit number with the PRNG.

If you need a number in an awkward range like 0-35, you'll need to discard a lot of numbers, but keep in mind this may still be faster than modular division which is pretty dang slow.

> My method may not suffer as much from those drawbacks, but you're right, without testing it thoroughly, there's no way to know and it should not be relied upon instead of F-Y.

That's very much not what I said.

"Testing it thoroughly" is not adequate. Start by proving the algorithm is correct. If you don't have a proof the algorithm works there's no reason to even start implementing it so there's nothing to test.

> EDIT: My intuition was correct more or less. Doing it the way I described serves to "smear" any bias across the deck. Fascinating chatgpt convo on it: https://chatgpt.com/share/68bd103f-9188-8004-8cbc-86693a0d87...

Jesus Christ, no. If you still believe anything ChatGPT says then security is not the field for you.

replies(2): >>45157612 #>>45158395 #
6. oneshtein ◴[] No.45156002{3}[source]
> Turns out that an even easier way is just to assign 128 bits of randomness

52! is roughly 2^226.

You cannot address all 2^226 positions with a 2^128 address generated from 2^64 seed.

replies(2): >>45156669 #>>45158430 #
7. wzdd ◴[] No.45156321[source]
The artice links to the paper which discusses the issue. The paper identifies two problems: a biased swapping algorithm, and a PRNG with easy-to-guess state.

For the first problem, a very simple "for each card, pick a number between 1 and 52 and swap the card with that number (or do nothing if it's the same card)" is proposed to eliminate bias. This seems pretty simple to verify.

For the second problem, there is no workaround except to use a better RNG.

So in the context of the paper, the reason your algorithm wouldn't have worked is because the programmers seeded their PRNG with the time of day. In the context of shuffling more generally, I don't know but it seems that there are even simpler algorithms.

That paper: https://web.archive.org/web/20140104095330/http:/www.cigital...

8. rcxdude ◴[] No.45156669{4}[source]
it's 128 bits per card. That's vastly more than 226 bits.
replies(1): >>45160082 #
9. indigodaddy ◴[] No.45157612{4}[source]
Lol, I laughed at his confident usage of "turns out" and "apparently" after his conclusive conversation with his chatgpt expert..
replies(1): >>45158407 #
10. pmarreck ◴[] No.45158395{4}[source]
And if you believe nothing ChatGPT says then you simply suffer from the opposite bias. The dilemma is that it is mostly right, most of the time, isn't it? Almost like... people? I mean, one person responded with an error that ChatGPT almost certainly wouldn't have made: https://news.ycombinator.com/item?id=45158430

If ChatGPT says things that are empirically provable, then it's not wrong, is it?

replies(2): >>45159921 #>>45207239 #
11. pmarreck ◴[] No.45158407{5}[source]
I never claimed conclusivity; words like "apparently" were supposed to suggest that, which is why I didn't use "obviously," "definitely" or "conclusively".

Also, maybe be less of an asshole next time. People are allowed to be wrong, that is the whole point of a discussion, and ridicule is not a counterargument.

/eyeroll

replies(2): >>45158605 #>>45207363 #
12. pmarreck ◴[] No.45158430{4}[source]
With all due respect, I don't think you understand. The 128 bits are just to sort the 52 cards into a random-enough order, and they have sufficient entropy and space for that.

We're not interested in sorting all possible decks, just all possible cards (52) into 1 unique randomized deck. The more bits we assign to these, the more any error or bias in the source goes to nil. (In contrast, consider the use of only 4 bits.)

Since everyone's bashing my use of ChatGPT, I don't think ChatGPT5 would have made that categorical error.

replies(1): >>45161676 #
13. indigodaddy ◴[] No.45158605{6}[source]
I don't think you're either wrong or right (in fact my intuition sides with yours I think) because I definitely know way less than you about the subject in general, or probably also less than almost anyone here on HN-- just the trust of swaths of people in the authoritativeness of LLM outputs does generally amuse me. I'm sure we'll get there one day, but we're not there yet. I clearly offended you, and I apologize.
14. tripletao ◴[] No.45159921{5}[source]
"Mostly right" is the most insidious option, because the frequent true statements or implications will lull you into believing the infrequent false ones.

For Fisher-Yates, if the RNG is good then the output permutation is independent of the input permutation. That's not true for your proposed algorithm, and neither you nor the the LLM has analyzed to determine how quickly the sensitivity to the input permutation decays. That sensitivity will have complicated structure, so it's a nontrivial problem in cryptanalysis.

The problem where the low bits of old, not cryptographically secure PRNGs were particularly non-random is real. That would be a problem for any algorithm using it though--imagine if your riffle took rand() % 2. The resolution (use the high bits, not the low bits) is the same for any algorithm. It's possible that your proposed algorithm is less sensitive because the much greater number of RNG calls makes it somehow average out, but neither you nor the LLM has analyzed that, because the residual structure is again a cryptanalysis problem.

Any potentially adversarial application should use a CSPRNG, moving the cryptographic security into that existing, well-studied block. So the problem you're trying to solve doesn't exist in practice. The LLM can probably be prompted to tell you that; but since you stated that the new shuffling algorithm was your idea, it worked to find reasons it was a good one. I don't think this is a good use of LLMs, but if you're going to then you have to avoid any suggestion that the new idea is yours, to avoid triggering the sycophancy.

15. amluto ◴[] No.45160082{5}[source]
You could use something like 12 bits of randomness per card (a very rough approximation of the log_2(n^2)) to get the probability that you reuse a number down to a manageable level, check if you’ve reused a number (which is basically free once you’ve sorted), and then repeat the whole process if you reused a number.

Or you could have a lazily calculated infinite precision random number per card and use more like 6 bits per card on expectation. Other than using more (and annoyingly variable) memory, this may well be faster than a properly unbiased Fisher-Yates.

Or you could assign a few bits per card, sort, and then recurse on each group of cards that sorted into the same bucket.

In summary, there are lots of genuinely unbiased solutions (assuming a perfect binary RNG), and they all boil down to something roughly equivalent to rejection sampling.

16. oneshtein ◴[] No.45161676{5}[source]
2^128*52 is 6656 bits, or 30x more than 2^226, so it will work IF those bits are independent. If CSPRNG is used, such as AES128, it will use 2^128 seed, which is not enough to cover all 2^226 permutations. If PRNG sequence is reused to generate more shuffles, then the next shuffle can be predicted from the previous one.

CSPRNG AES256 with 2^256 random seed is safe.

CSPRNG AES128 with 2^128 (or less) random seed is not safe.

replies(1): >>45165388 #
17. tripletao ◴[] No.45165388{6}[source]
> If PRNG sequence is reused to generate more shuffles, then the next shuffle can be predicted from the previous one.

That's theoretically true by a counting argument, but for a CSPRNG the only way to compute that next shuffle costs about as much computationally as brute force. 128 bits is enough to make that infeasible, though 256 may be preferable for the usual abundance of caution.

The number of permutations doesn't matter here, since an attack becomes computationally infeasible before it becomes counting infeasible. It's also possible for a practical attack to exist even when the output can't be predicted exactly, since statistical information is still valuable.

18. kerkeslager ◴[] No.45207239{5}[source]
> And if you believe nothing ChatGPT says then you simply suffer from the opposite bias.

No, you're making a logical error. It's not that I believe the opposite of what ChatGPT says, it's that I don't believe what ChatGPT says based on ChatGPT saying it alone. If ChatGPT says your algorithm is right, it's not that I automatically believe your algorithm is wrong, it's that I believe I still don't know it's right because ChatGPT is shit at being right.

> The dilemma is that it is mostly right, most of the time, isn't it? Almost like... people?

I mean, yeah. Which is why I don't believe things just because some random person says them, either.

I tend to assign confidence intervals to beliefs in my head, and even and expert saying something doesn't make me super confident in it. I tend to believe things strongly if I can understand the hard evidence for them, not because any person, even an expert, says it.

With security, we have tens of thousands of examples of unproven security assertions being proven wrong later with disastrous effect, so the standard of proof is very high, and ChatGPT saying it is absolutely nowhere near close enough to proof. The only acceptable proof with a security algorithm is mathematical proof.

> If ChatGPT says things that are empirically provable, then it's not wrong, is it?

ChatGPT also says things that are empirically disprovable, so basically you always have to go to empirical proof to tell whether what ChatGPT has said is true, making ChatGPT next-to-useless if accuracy is at all a concern. Just go to empirical proof and skip the pointless ChatGPT step.

And accuracy really, really is a very big concern in secure random shuffling algorithms, so you really really need to empirically prove that they work which you really really have not done because you stopped thinking when you trusted ChatGPT. Your shuffling algorithm is not proven, and very likely is not secure.

19. kerkeslager ◴[] No.45207363{6}[source]
Sure, it's okay to be wrong, but it gets significantly less okay when you just keep doubling down on being wrong and refuse to admit you are wrong.

Unless I've missed something, you still seem to be under the very wrong impression that you've come up with a secure shuffling algorithm. You simply have not come up with a secure shuffling algorithm. The minimum viable evidence of security for an algorithm is mathematical proof and you don't have that, and you don't get to sit a the security table until you do. Put bluntly: mathematical proof or GTFO.

Yes, I'm gatekeeping, and there's a very good reason for that: we know from tens of thousands of bad attempts at secure algorithms that if there isn't a proof of security it very likely is not secure, with disastrous effects. It is not a kindness to let this sort of wrong information spread, and it's not assholish to be clear that it is wrong.

EDIT: Even if you want to add "probably" or some other qualifying word to your claims, "This shuffling algorithm is probably secure" is still wrong. No, that shuffling algorithm is probably not secure. I am sorry to be a hardass here, but the idea that this is an effective approach to security needs to be deleted from your mind.

20. kerkeslager ◴[] No.45207546{3}[source]
> Turns out that an even easier way is just to assign 128 bits of randomness to each card and sort them by that. Degrades gracefully from biased sources, apparently.

This was edited in after my original response, so I didn't respond to it previously.

I won't generally do the work to find the attacks your bad security claims are susceptible to, but this one is trivial.

This does not degrade gracefully in any way. If your biased source of randomness generates the same numbers more than once, the items in the list are assigned the same number, so that sorting creates a bias toward the pre-shuffle ordering. This sort of collision is more common than one would intuit due to the birthday paradox[1] which is why this class of attacks is known as birthday attacks. In short, the birthday paradox amplifies problems of biased randomness sources rather than degrading gracefully.

In fact, this bias toward the pre-shuffle ordering still exists even with a perfect RNG, it's just going to be extremely unusual for a collision to occur in the space of 128 bits. But Fisher-Yates doesn't have this problem; why use an algorithm that we can easily prove is biased, even if the bias is negligible with proper PRNG?

Also, I really have to question your claim that this is even easier than Fisher-Yates. Fisher-Yates is not particularly complicated. Did you even bother to learn how Fisher-Yates works before assuming you could come up with a simpler solution than the industry standard?

[1] https://en.wikipedia.org/wiki/Birthday_problem

EDIT: Another problem with all the algorithms you've proposed is that they can't be partially applied. A hand of 6max Texas Hold'em, the most common online poker variant, requires only 2x6 + 5 = 17 cards, and by partially applying Fisher-Yates you can select just the first 17 cards in the shuffled deck without having to know or generate the other 35. This will be faster and deplete your RNG slower, further decreasing the chance of bias.