Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data consistency: queue tasks in transaction #277

Open
majsan opened this issue May 29, 2024 · 4 comments
Open

Data consistency: queue tasks in transaction #277

majsan opened this issue May 29, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@majsan
Copy link
Member

majsan commented May 29, 2024

We use MariaDB for persistence and ES for search. When we save something in MariaDB, we are not sure that it will be added to ES, since ES might be down or not answer for some reason. Currently we have no way to even detect if something like that happens (though it should be possible to see in logs, in theory).

To solve this, add a new table for queuing tasks to be done in the background. The tasks can be adding, deleting or updating entries in ES. The tasks are added in the same transaction as adding, deleting or updating in MariaDB. Some process will look in the tasks-table and do the tasks in the background. If the task was completed, the process removes the row in the table.

What can still happen is that the background worker fails to remove the row in the tasks-table even though the task succeeded. Because of this, it is important that the tasks are idempotent, i.e. can be run again without breaking the data consistency. Also, the order of the tasks matter, for example if the same entry is edited again before the worker has processed the first edit.

@nick8325
Copy link
Contributor

We also need some way of handling plugins. E.g. in the places repository we use the link plugin to fetch info from the municipalities repository. Then if we e.g. update the entry for Göteborgs kommun we also need to reindex all places that refer to Göteborgs kommun.

@nick8325
Copy link
Contributor

In particular, here are some things that ought to work:

  1. If we update or delete the entry for Göteborgs kommun, then we should reindex the entry for Göteborg.
  2. Suppose that to begin with we have an entry for Göteborg but no entry for Göteborgs kommun (so the link plugin returns no data). If we add an entry for Göteborgs kommun, we should reindex the entry for Göteborg.
  3. If we update the resource config for the municipalities resource, then we should reindex the places resource too.

I had been thinking about keeping track of when one entry depends on another (e.g. Göteborg depends on Göteborgs kommun), but that doesn't work in case 2 since there's no existing entry to depend on. I guess in this case we could run an Elasticsearch query to find out which places to update - give me all places that refer to Göteborgs kommun (or rather municipality = 1480)? Not sure how this would work in general.

@majsan
Copy link
Member Author

majsan commented May 31, 2024

Using Elasticsearch to find all references is probably enough and that has to be in the task table as well, created in the same transaction as the original edit.

@majsan
Copy link
Member Author

majsan commented May 31, 2024

Since the linking is done in a plugin, I guess the plugin should be responsible for this behavior. Not sure how to trigger it though, since the resource being referenced doesn't know anything about the plugin.

@majsan majsan self-assigned this Nov 15, 2024
@majsan majsan added the enhancement New feature or request label Nov 15, 2024
majsan added a commit that referenced this issue Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants