Nice to see methodology here. Ideally Lancedb lance v2 and nimble would also both be represented here. It feels like there's huge appetite to do better than Parquet; ideally work like this would help inform where we go next.
replies(1):
There is also Vortex (https://github.com/fulcrum-so/vortex). That has modern encoding schemes that we want to use.
BtrBlocks (https://github.com/maxi-k/btrblocks) from the Germans is another Parquet alternative.
Nimble (formerly Alpha) is a complicated story. We worked with the Velox team for over a year to open-source and extend it. But plans got stymied by legal. This was in collaboration with Meta + CWI + Nvidia + Voltron. We decided to go a separate path because Nimble code has no spec/docs. Too tightly coupled with Velox/Folly.
Given that, we are working on a new file format. We hope to share our ideas/code later this year.