Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: changes are not delivered after a sync message wasn't delivered #368

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

yarolegovich
Copy link
Contributor

@yarolegovich yarolegovich commented Aug 2, 2024

  1. syncState is saved locally right after A.generateSyncMessage, even if the message delivery fails because there's no feedback from the network subsystem.
  2. We'll get into an infinite loop of sync messages with the peer after they or us make the next change. theirNeed will be ignored because these hashes are true in sentHashes.
  3. We won't send these changes until the Repo gets recreated (in-memory syncState gets lost and storage only has sharedHeads).

image

Also I have a cast to any because the real types are not what the declaration states. Some exposed fields are missing as well (inFlight, haveResponded).

In the tests I added:

  1. Alice and Bob share a document.
  2. Bob gets into a state where messages will fail and makes a change to the document.
  3. Connectivity gets restored.
  4. In the first test Bob makes another change. No changes are ever delivered to Alice.
  5. In the second test Alice make a change and it gets delivered to Bob, but Bob's change never gets delivered to Alice.

@alexjg
Copy link
Contributor

alexjg commented Aug 5, 2024

Thanks for digging into this. Firstly, can you check if this is still a problem rebase on main? I've tried running the tests you've written with the fix (the code in DocSynchronizer) commented out and the tests still pass which makes me worry that the tests are not working as intended.

To the actual issue. The sync protocol currently assumes a reliable in-order stream, i.e. we assume that there are no dropped or re-ordered messages. Under that model the missing piece here is that the network adapter needs to signal that a connection has been lost by emitting a peer-disconnected event, this will cause the Repo to reset it's sync state. Now, we can also fix this specific issue by implementing the logic you've described in automerge directly and I think we should do that (I'll put in a PR for it later) but we use the sentHashes for tracking all kinds of in-flight changes, not just the needed ones. The in-flight tracking was introduced in the first place because otherwise we found pretty severe message amplification issues in scenarios where some peers have much faster connections than others and/or are producing changes at a higher rate. I really want to remove the reliable in-order stream requirement but I think doing so is more complicated than just removing the in-flight tracking and should instead be achieved by the sync protocol redesign we're working on.

To summarize, I think we should make sure the tests are actually failing when we expect them to and then we should "fix" the issue by ensuring that a peer-disconnected event is emitted after the dropped messages to simulate the actual requirements on the NetworkAdapter being met. I will modify the sync logic to always send needed changes but this does not lift the reliable in-order stream requirement which I think we should instead fix in the new sync protocol.

@alexjg
Copy link
Contributor

alexjg commented Aug 14, 2024

I've been thinking about this a little more and had a chat with @pvh about it. One option which might make sense would be to disable the in-flight message tracking as a config option on the sync protocol. This would remove the requirement for reliable in-order streams at the cost of some message amplification. This seems like a reasonable intermediate step between now and the new sync protocol. @yarolegovich would that fit your needs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants