←back to thread

Sampling with SQL

(blog.moertel.com)
175 points thunderbong | 1 comments | | HN request time: 0.001s | source
Show context
bastawhiz ◴[] No.41903826[source]
> ORDER BY -LN(1.0 - RANDOM()) / weight

Isn't this the same as just

> ORDER BY -LN(RANDOM()) / weight

replies(1): >>41904029 #
tmoertel ◴[] No.41904029[source]
Yes, except that it avoids the possibility of a floating-point error. In most SQL implementations, `RANDOM()` can return zero, but the log of 0 is undefined.

From the article:

> One last numerical subtlety. Why do we generate random numbers with the expression `1.0 - RANDOM()` instead of just `RANDOM()`? Since most implementations of RANDOM(), such as the PCG implementation used by DuckDB, return a floating-point value in the semi-closed range [0, 1), they can theoretically return zero. And we don’t want to take the logarithm of zero. So we instead use `1.0 - RANDOM()` to generate a random number in the semi-closed range (0, 1], which excludes zero.

replies(1): >>41904309 #
1. bastawhiz ◴[] No.41904309[source]
Got it, I'd missed that. Thank you