←back to thread

157 points craigkerstiens | 4 comments | | HN request time: 0.657s | source
Show context
linuxhansl ◴[] No.41873697[source]
Parquet itself is actually not that interesting. It should be able to read (and even write) Iceberg tables.

Also, how does it compare to pg_duckdb (which adds DuckDB execution to Postgres including reading parquet and Iceberg), or duck_fdw (which wraps a DuckDB database, which can be in memory and only pass-through Iceberg/Parquet tables)?

replies(3): >>41874044 #>>41874177 #>>41876793 #
AdamProut ◴[] No.41874044[source]
Had a similar thought. Azure Postgres has something similar to pg_parquet (pg_azure_storage), but we're looking into replacing it with pg_duckdb assuming the extension continues to mature.

It would be great if the Postgres community could get behind one good opensource extension for the various columnstore data use cases (querying data stored in an open columnstore format - delta, iceberg, etc. being one of them). pg_duckdb seems to have the best chance at being the goto extension for this.

replies(1): >>41874183 #
1. mslot ◴[] No.41874183[source]
Fun fact, I created pg_azure_storage :)
replies(2): >>41877261 #>>41918183 #
2. brinox ◴[] No.41877261[source]
I was just wondering if pg_parquet could be combined with pg_azure_storage to write Parquet files to Azure Storage.

I had problems with pg_azure_storage in the past, because the roles pg_read_server_files and pg_write_server_files are unassignable on Azure PostgreSQL databases which makes the use of `COPY {FROM,TO}` impossible.

replies(1): >>41877608 #
3. mslot ◴[] No.41877608[source]
Azure is not supported as a backend in pg_parquet right now, but shouldn't be hard to add (contributions welcome!)

https://github.com/CrunchyData/pg_parquet

It would not be safe to let any user access object storage. Therefore, pg_parquet has two roles called parquet_object_store_read and parquet_object_store_write that give permission to COPY FROM/TO object storage (but not local file system).

In pg_azure_storage there is a comparable azure_storage_admin role that needs to be granted to users that need Azure Blob Storage permission.

4. dektol ◴[] No.41918183[source]
Is pg_azure_storage available on GitHub?