←back to thread

Sampling with SQL

(blog.moertel.com)
175 points thunderbong | 1 comments | | HN request time: 0.529s | source
1. oli5679 ◴[] No.41906905[source]
One thing that can be useful with sampling is sampling a consistent but growing sub population. This can help maintain a consistent holdout for machine learning models, help you sample analytical data and test joins without null issues etc.

If you use a deterministic hash, like farm_fingerprint on your id column (e. g user_id) and keep if modulus N = 0, you will keep the same growing list of users across runs and queries to different tables.