So it’s a toolkit written in Rust. It is not a file format.
So it’s a toolkit written in Rust. It is not a file format.
That said, the immediate next line in the README perhaps clarifies a bit?
"Vortex is designed to be to columnar file formats what Apache DataFusion is to query engines (or, analogously, what LLVM + Clang are to compilers): a highly extensible & extremely fast framework for building a modern columnar file format, with a state-of-the-art, "batteries included" reference implementation."
It’s a framework for building file formats. This does not indicate that Vortex is, itself, a file format.
Perhaps we should clean up the wording in the intro, but yes there is in fact a file format!
We actually built the toolkit first, before building the file format. The interesting thing here is that we have a consistent in-memory and on-disk representation of compressed, typed arrays.
This is nice for a couple of reasons:
(a) It makes it really easy to test out new compression algorithms and compute functions. We just implement a new codec and it's automatically available for the file format.
(b) We spend a lot of energy on efficient push down. Many compute functions such as slicing and cloning are zero-cost, and all compute operations can execute directly over compressed data.
Highly encourage you to checkout the vortex-serde crate in the repo for file format things, and the vortex-datafusion crate for some examples of integrating the format into a query engine!