←back to thread

245 points gatesn | 1 comments | | HN request time: 0s | source
Show context
Havoc ◴[] No.41840621[source]
Can one edit it in place?

That’s the main thing currently irritating me about parquet

replies(2): >>41841001 #>>41845342 #
aduffy ◴[] No.41841001[source]
You're unlikely to find this with any analytic file format (including Vortex). The main reason is that OLAP systems generally assume an immutable distributed object/block layer (S3, HDFS, ABFS, etc.).

It's then generally up to a higher-level component called a table format to handle the idea of edits. See for example how Apache Iceberg handles deletes https://iceberg.apache.org/spec/#row-level-deletes

replies(2): >>41841714 #>>41848589 #
slotrans ◴[] No.41841714[source]
This is true, and in principle a good thing, but in the time since Parquet and ORC were created GDPR and CCPA are things that have come to exist. Any format we build in that space, today, needs to support in-place record-level deletion.
replies(3): >>41841878 #>>41844443 #>>41846674 #
1. mkesper ◴[] No.41846674{3}[source]
You can avoid that if you save only per-user encrypted content (expensive, I know). That way you just should have to revoke that key to remove access to the data. Advantage is you cannot forget any old backup etc.