←back to thread

245 points gatesn | 1 comments | | HN request time: 0.204s | source
Show context
kwillets ◴[] No.41843590[source]
Does this fragment columns into rowgroups like Parquet, or is it more of a pure columnstore? IME a data warehouse works much better if each column isn't split into thousands of fragments.
replies(1): >>41844366 #
1. danking00 ◴[] No.41844366[source]
Yeah, you and us are on the same page (heh). We don’t want the format to require row grouping. The file format has a layout schema written in a footer. A row group style layout is supported but not required. Specification of the layout will probably evolve, but currently the in-memory structure becomes the on-disk structure. So, if you have a ChunkedArray of StructArray of ChunkedArray you’ll get row groups and pages within them. If you had a StructArray of ChunkedArray you’ll just get per-column pages.

I’m working on the Python API now. I think we probably want the user to specify, on write, whether they want row groups or not and then we can enforce that as we write.