←back to thread

Sampling with SQL

(blog.moertel.com)
175 points thunderbong | 1 comments | | HN request time: 0.001s | source
Show context
Horffupolde ◴[] No.41903010[source]
ORDERing by RANDOM on a large table requires a full scan, even with LIMIT. For tables that don’t fit in memory this is impractical and I/O shoots to 100%.
replies(4): >>41904118 #>>41904953 #>>41906210 #>>41906816 #
1. crazygringo ◴[] No.41906210[source]
Yup. For any kind of production app with a table that isn't tiny, you should absolutely never do an ORDER BY RANDOM().

You just need to make sure you're running these queries against a database used solely for occasional analytics, where it's no problem to be saturating the disk for two minutes or whatever, because it won't bother anybody else.