Dbt – Incremental but Incomplete

I really wish data engineers didn't have to hand-roll incremental materialization in 2024. This is really hard stuff to get right (as the post outlines) but it is absolutely critical to keeping latency and costs down if you're going to go all in on deep, layered, fine-grained transformations (which still seems to me to be the best way to scale a large / complex analytics stack).

My prediction a few years back was that Materialize (or similar tech) would magically solve this - data teams could operate in terms of pure views and let the database engine differentiate their SQL and determine how to apply incremental (ideally streaming) updates through the view stack. While I'm in an adjacent space, I don't do this day-to-day so I'm not quite sure what's holding back adoption here - maybe in a few years more we'll get there.

I wholeheartedly agree. When I worked at Shopify, we had to hand-roll our incremental data models using Spark, and the complexity of managing deep DAGs made tasks like backfilling and refactoring a huge pain. Tools like dbt and SQLMesh face similar challenges.

The chaos of existing approaches was a large part of what drove me to join Materialize. With Materialize, you can use dbt on “easy-mode”, while Materialize handles incremental logic, removing the usual headaches around processing time and keeping everything up to date within a second or two.

I recently gave a talk at Data Council about this unlock, it’s total magic: https://youtu.be/pLb5sFZ7nWw

For anyone interested, my colleague Seth also discussed this in a recent blog post: https://materialize.com/blog/migrating-postgres-materialize/