Parquet and other columnar storage formats are essentially already tuned for that.
A lot of requests in themselves shouldn't be that horrible with Cloudfront nowadays, as you both have low latency and with HTTP2 a low-overhead RPC channel.
There are some potential remedies, but each come with significant architetural impact:
- Bigger range queries; For smallish tables, instead of trying to do point-based access for individual rows, instead retrieve bigger chunks at once and scan through them locally -> Less requests, but likely also more wasted bandwidth
- Compute the specific view live with a remote DuckDB -> Has the downside of having to introduce a DuckDB instance that you have to manage between the browser and S3
- Precompute the data you are interested into new parquest files -> Only works if you can anticipate the query patterns enough
I read in the sibling comment that your main issue seems to be re-reading of metadata. DuckDB is AFAIK able to cache the metadata, but won't across instances. I've seen someone have the same issue, and the problem was that they only created short-lived DuckDB in-memory instances (every time the wanted to run a query), so every time the fresh DB had to retrieve the metadata again.