←back to thread

245 points gatesn | 2 comments | | HN request time: 0s | source
Show context
the_mitsuhiko ◴[] No.41840459[source]
> One of the unique attributes of the (in-progress) Vortex file format is that it encodes the physical layout of the data within the file's footer. This allows the file format to be effectively self-describing and to evolve without breaking changes to the file format specification.

That is quite interesting. One challenge in general with parqet and arrow in the otel / observability ecosystem is that the shape of data is not quite known with spans. There are arbitrary attributes on them, and they can change. To the best of my knowledge no particularly great solution exists today for encoding this. I wonder to which degree this system could be "abused" for that.

replies(8): >>41840665 #>>41842038 #>>41842282 #>>41842347 #>>41843259 #>>41844697 #>>41846992 #>>41848634 #
1. gigatexal ◴[] No.41840665[source]
As someone who works in data schema on read formats like parquet are amazing. I hate having to guess schemas with CSVs.
replies(1): >>41841700 #
2. physicsguy ◴[] No.41841700[source]
Pandera is quite nice for at least forcing validation in Pandas for this