Replies: 2 comments 3 replies
-
I agree. I actually think we would never need to synchronize checkpoints in a distributed way. Worst case would be checkpoints is taking up lots of io/cpu resources from the writer to the point where it's impacting the ingestion latency significantly. To address that, we can just run checkpoint in a dedicated checkpoint worker that only performs checkpoint.
Instead of loading the table from scratch in the checkpoint thread, how about we sjust end the necessary fields in deltatable struct over directly through the channel? |
Beta Was this translation helpful? Give feedback.
-
I've submitted an initial PR in #280 with some design oriented notes in the PR. After review there, I'll make some updates to this discussion regarding next steps. |
Beta Was this translation helpful? Give feedback.
-
Summary
This is an open forum to discuss our solution for writing checkpoint files (See also #106).
Some up front notes:
Some up front opinions of mine that affect the design:
_last_checkpoint
with an older checkpoint version in a multi-writer scenario - its also not_that_bad. Again - the worst that can happen is that readers reading at the time will have to load some extra JSON log entries.These two opinions have a strong impact on the design. Basically - I'd like to avoid the complexity of coordinating checkpoints in a distributed way and accept some trade-offs that do not break log correctness.
Design
Static Structure
The image below shows a static structure diagram that mentions some existing relationships in delta-rs and some additions to support writing checkpoints. Blue highlights are adds, red highlights are especially important deps. Each DeltaTable instance already holds the last checkpoint in memory. The checkpoint includes a
version
field which is useful for the snapshot logic. The added DeltaTablelast_checkpoint_version()
method just exposes the version of the last checkpoint to callers who need to determine whether they should run a checkpoint or not.Delta writes must always go through a
DeltaTransaction
. The committed version is useful to allow callers to determine whether a checkpoint should be written or not.The static structure also proposes a
CheckPointWriter
struct with a publicrun_loop
method and a publictx
(aka "sender") field. TheCheckPointWriter
run_loop
method may be started on a separate thread for any application that wants to periodically create checkpoints and publish checkpoint intents as versions are written. This channel may be used internally by higher level writers or managed externally.Interaction
Client Perspective
This next image shows a sequence diagram from the client perspective. A client (aka writer) will:
To prevent checkpoint creation from slowing down data writes - the client can use the above mentioned data points coupled with the static structure design to publish a checkpoint intention for a separate thread to handle. Alternatively (and not pictured), a client could run checkpoints explicitly without hosting a separate thread for the
CheckPointWriter
. In that case, rather than sending a checkpoint intention to the writer channel, the client could just invokedelta_table.create_checkpoint()
when appropriate.CheckPointWriter Perspective
No pics here at the moment, but upon receiving a message to create a checkpoint - a CheckPointWriter instance must
load_with_version
delta_table.create_checkpoint()
DeltaTable.create_checkpoint perspective
No pics here at the moment either, but ultimately - this method just needs to write the associated
DeltaTableState
as one or more parquet files. The image under the "Static Structure" section describes details about the associations that make this possible and the fields ofDeltaTableState
that must be accounted for in the parquet file writes.I'll repeat one time - I think we can avoid synchronizing checkpoint writes in a distributed way for now and add this as an optimization later because of the points mentioned in the summary regarding worker crash or backtracking
_last_checkpoint
.Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions