-
Notifications
You must be signed in to change notification settings - Fork 113
Specification for distributed transactions in a Raft based system
This protocol addresses the management of distributed transactions in a Raft-based system. The system's state is replicated using the Raft consensus algorithm, ensuring data consistency and fault tolerance. Write transactions are only possible on the leader node, while read-only transactions can occur on any node. This specification considers situations when data are stored in a single key-value map. But all use cases can be extended to situations when several key-value maps are handled. In the context of Xodus key-value map represents a single store instance.
In a distributed transaction system, three types of dependencies can lead to anomalies:
- Read-write (RW) dependencies: When a transaction reads a data item that has been or will be modified by another transaction. This can lead to a lost update anomaly.
- Write-write (WW) dependencies: When two transactions modify the same data item concurrently. This can result in an overwritten update anomaly.
- Write-read (WR) dependencies: When a transaction modifies a data item that has been or will be read by another transaction. This can cause the write skew anomaly.
- The OM is an in-memory data structure that contains a map with the hash codes of keys, a list of operations sorted by timestamps, and the timestamp of the newest transaction that reads each key. Read-only transactions do not update the read timestamp because they do not suffer from the presence of WR dependencies.
- OM is consulted by the write transactions when determining whether to proceed with modifications or to roll back due to WR dependencies.
- Timestamp Management:
- Timestamps are increased only by write transactions.
- Read-only transactions reuse the same timestamp value if no write transactions occur, minimizing the overhead of managing timestamps for read-only transactions.
- Handling Read-Write (RW) Dependencies:
- Each transaction maintains a snapshot of running transactions at the beginning, in the form [Tmin, T1, T2, Tmax), where Tmin represents the minimum timestamp for committed transactions (all write transactions with timestamp less or equal to Tmin are already committed or aborted), T1 and T2 are committed transactions (or aborted), and Tmax represents the maximum timestamp since which all transactions are uncommitted (aborted, in-progress or not started yet).
- Transactions use the snapshot evaluated on the base of OM by fetching the last operation for provided key in the committed state with a timestamp less or equal to the transaction timestamp and visible according to the transaction's snapshot taken at the begging of the transaction, to provide a a consistent view of the data, preventing anomalies resulting from RW dependencies.
- Handling Write-Write (WW) Dependencies:
- Changes made by concurrent write transactions are applied in the order of their timestamps. This order is ensured by applying operations from OM to the key-value map for committed transactions in order of operations timestamps in a background thread. Only committed transactions and only not overwritten at the moment of migration key-value pairs will be moved to the key-value map.
- This ordering ensures that WW conflicts are resolved without the need for rollbacks, maintaining data consistency and serializability.
- Handling Write-Read (WR) Dependencies:
- Write transactions consult the Operations Map (OM) to check if their timestamp is greater than the timestamp of the last read transaction for the key they intend to modify. If not, the write transaction waits for a predefined amount of time till the commit of the dependent transaction and rolls back on the timeout.
- This approach prevents WR anomalies while balancing liveness and the number of rollbacks in the system.
Based on the protocol details here is an analysis of potential liveness and transaction anomaly issues:
-
Liveness: This protocol's primary concern for liveness is the potential for write transactions to wait indefinitely due to Write-Read (WR) dependencies. The protocol includes a predefined waiting period for the write transactions to address this issue. If the waiting period expires, the write transaction rolls back, ensuring liveness in the system. This approach helps to maintain system performance and balance the liveness of transactions.
However, choosing an appropriate waiting period based on the specific use case and system characteristics is essential, as it may impact system performance. If the waiting period is longer, it could lead to many rollbacks and reduced throughput. On the other hand, if the waiting period is too long, it could increase latency for writing transactions.
-
Transaction Anomalies: The protocol handles Read-Write (RW), Write-Write (WW), and Write-Read (WR) dependencies to prevent transaction anomalies:
- RW Dependencies: By maintaining a snapshot of running transactions, the protocol ensures read-only transactions have a consistent view of the data, preventing anomalies caused by RW dependencies.
- WW Dependencies: By applying changes made by concurrent write transactions in the order of their timestamps, WW conflicts are resolved without the need for rollbacks, maintaining data consistency and serializability.
- WR Dependencies: By consulting the Operations Map (OM) and checking the timestamp of the last read transaction for the key being modified, the protocol prevents WR anomalies while balancing liveness and the number of rollbacks in the system.
Given the protocol's design, it effectively addresses the primary sources of transaction anomalies.
- Because of the approach used to resolve RW dependencies, the protocol can not strictly be claimed as one that provides serializable isolation. This is because given a set of execution of read and write transactions. There could be situations when those executions can not be presented as such that could be executed in a single thread by some order. On another side, this protocol lacks any known transaction anomalies. So this protocol's isolation level lies between a snapshot and serializable isolation levels.
- We need to write a TLA+ presentation of this protocol and test it on typical properties such as liveness, safety and fairness. https://youtrack.jetbrains.com/issue/XD-931/Test-protocol-of-distributed-transactions-using-TLA