In Vortex, we've specifically invested in high throughput compression techniques that admit O(1) random access. These kinds of techniques are also sometimes called "lightweight compression". The DuckDB folks have a good writeup [1] on the common ones.
[1] https://duckdb.org/2022/10/28/lightweight-compression.html
https://blog.acolyer.org/2018/09/26/the-design-and-implement...
The compression and decompression throughputs of Vortex (and other lightweight compression schemes) are similar or better than Parquet for many common datasets. Unlike Zstd or Blosc, the lightweight encodings are, generally, both computationally simple and SIMD friendly. We’re seeing multiple gibibytes per second on an M2 MacBook Pro on various datasets in the PBI benchmark [1].
The key insight is that most data we all work with has common patterns that don’t require sophisticated, heavyweight compression algorithm. Let’s take advantage of that fact to free up more cycles for compute kernels!