←back to thread

706 points janpio | 3 comments | | HN request time: 0s | source
Show context
NelsonMinar ◴[] No.45676467[source]
Be sure to see the team's whole list of Cursed Knowledge. https://immich.app/cursed-knowledge
replies(5): >>45676661 #>>45677766 #>>45677816 #>>45678252 #>>45679167 #
levkk ◴[] No.45676661[source]
The Postgres query parameters one is funny. 65k parameters is not enough for you?!
replies(2): >>45676824 #>>45677258 #
strken ◴[] No.45676824[source]
As it says, bulk inserts with large datasets can fail. Inserting a few thousand rows into a table with 30 columns will hit the limit. You might run into this if you were synchronising data between systems or running big batch jobs.

Sqlite used to have a limit of 999 query parameters, which was much easier to hit. It's now a roomy 32k.

replies(2): >>45676893 #>>45676985 #
1. tym0 ◴[] No.45676985[source]
Right, for postgres I would use unnest for inserting a non-static amount of rows.
replies(1): >>45677652 #
2. strken ◴[] No.45677652[source]
In the past I've used batches of data, inserted into a separate table with all the constraints turned off and using UNNEST, and then inserted into the final table once it was done. We ended up both batching the data and using UNNEST because it was faster but it still let us resume midway through.

We probably should have been partitioning the data instead of inserting it twice, but I never got around to fixing that.

COPY is likely a better option if you have access to the host, or provider-specific extensions like aws_s3 if you have those. I'm sure a data engineer would be able to suggest a better ETL architecture than "shove everything into postgres", too.

replies(1): >>45678736 #
3. devjab ◴[] No.45678736[source]
Was MERGE too slow/expensive? We tend to MERGE from staging or temporary tables when we sync big data sets. If we were on postgres I think we'd use ... ON CONFLICT, but MERGE does work.