←back to thread

226 points meetpateltech | 1 comments | | HN request time: 0.201s | source

Recent and related: AWS multiple services outage in us-east-1 - https://news.ycombinator.com/item?id=45640838 (2045 comments)
Show context
shayonj ◴[] No.45679256[source]
I was kinda surprised the lack of CAS on per-endpoint plan version or rejecting stale writes via 2PC or single-writer lease per endpoint like patterns.

Definitely a painful one with good learnings and kudos to AWS for being so transparent and detailed :hugops:

replies(1): >>45681204 #
1. donavanm ◴[] No.45681204[source]
See https://news.ycombinator.com/item?id=45681136. The actual DNS mutation API does, effectively, CAS. They had multiple unsynchronized writers who raced without logical constraints or ordering to teh changes. Without thinking much they _might_ have been able to implement something like a vector either through updating the zone serial or another "sentinel record" that was always used for ChangeRRSets affecting that label/zone; like a TXT record containing a serialized change set number or a "checksum" of the old + new state.

Im guessing the "plans" aspect skipped that and they were just applying intended state, without trying serialize them. And last-write-wins, until it doesnt.