←back to thread

637 points neilk | 2 comments | | HN request time: 0.001s | source
Show context
chacham15 ◴[] No.43554350[source]
So, if I understand correctly, the consistency model is essentially git. I.e. you have a local copy, makes changes to it, and then when its time to "push" you can get a conflict where you can "rebase" or "merge".

The problem here is that there is no way to cleanly detect a conflict. The documentation talks about pages which have changed, but a page changing isnt a good indicator of conflict. A conflict can happen due to a read conflict. E.g.

Update Customer Id: "UPDATE Customers SET id='bar' WHERE id='foo'; UPDATE Orders SET customerId='bar' WHERE customerId='foo'"

Add Customer Purchase: "SELECT id FROM Customers WHERE email="blah"; INSERT INTO Orders(customerId, ...) VALUES("foo", ...);"

If the update task gets committed first and the pages for the Orders table are full (i.e. inserting causes a new page to allocated) these two operations dont have any page conflicts, but the result is incorrect.\

In order to fix this, you would need to track the pages read during the transaction in which the write occurred, but that could easily end up being the whole table if the update column isnt part of an index (and thus requiring a table scan).

replies(2): >>43554511 #>>43554646 #
ncruces ◴[] No.43554646[source]
They address this later on.

If strict serializability is not possible, because your changes are based on a snapshot that is already invalid, you can either replay (your local transactions are not durable, but system-wide you regain serializability) or merge (degrading to snapshot isolation).

As long as local unsynchronized transactions retain the page read set, and look for conflicts there, this should be sound.

replies(2): >>43555813 #>>43556414 #
bastawhiz ◴[] No.43556414[source]
> your local transactions are not durable

This manifests itself to the user as just data loss, though. You do something, it looks like it worked, but then it goes away later.

replies(1): >>43556755 #
ncruces ◴[] No.43556755{3}[source]
From the description, you can reapply transactions. How the system handles it (how much of it is up to the application, how much is handled in graft) I have no idea.
replies(1): >>43559031 #
bastawhiz ◴[] No.43559031{4}[source]
What does that mean though? How can you possibly reapply a failed transaction later? The database itself can't possibly know how to reconcile that (if it did, it wouldn't have been a failure in the first place). So it has to be done by the application, and that isn't always possible. There is still always the possibility of unavoidable data loss.

"Consistency" is really easy, as it turns out, if you allow yourself to simply drop any inconvenient transactions at some arbitrary point in the future.

replies(1): >>43562926 #
1. kikimora ◴[] No.43562926{5}[source]
This! Solving merge conflicts in git is quite hard. Building an app such that it has a UI and use cases for merging every operation is just unrealistic. Perhaps if you limit yourself to certain domains like CRDTs or turn based games or data silos modified by only one customer it can be useful. I doubt it can work in general case.
replies(1): >>43563887 #
2. bastawhiz ◴[] No.43563887[source]
The only situation I can think of where it's always safe is if the order that you apply changes to the state never matters:

- Each action increments or decrements a counter

- You have a log of timestamps of actions stored as a set

- etc.

If you can't model your changes to the data store as an unordered set of actions and have that materialize into state, you will have data loss.

Consider a scenario with three clients which each dispatch an action. If action 1 sets value X to true, action 2 sets it to true, and action 3 sets it to false, you have no way to know whether X should be true or false. Even with timestamps, unless you have a centralized writer you can't possibly know whether some/none/all of the timestamps that the clients used are accurate.

Truly a hard problem!