Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

423 Locked error when creating a new component with languages and doing auto-translate #13345

Open
2 tasks done
matzeeable opened this issue Dec 19, 2024 · 9 comments
Open
2 tasks done

Comments

@matzeeable
Copy link

Describe the issue

The following describes a sequential process of creating a new component and uploading the main POT file with new languages and auto translating them. Additionally, you can see the output within our custom logs.

1.) We create a new component with POST /api/projects/(string: project)/components/

https://translate.example.de/api/tasks/my-uuid/ {
  completed: true,
  progress: 100,
  result: { component: 10063 },
  log: 'wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version: rebase remote into repo b7600076b4c0975eb0c6bd90d4943afa75c74bae..b7600076b4c0975eb0c6bd90d4943afa75c74bae\n' +
    'wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version: scheduling update in background'
}

✅ This is successful.

2.) We install an addon with POST /api/components/(string: project)/(string: component)/addons/

Install addon weblate.gettext.msgmerge in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Installed addon weblate.gettext.msgmerge in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version

✅ This is successful.

3.) We create languages in the freshly created component with POST /api/components/(string: project)/(string: component)/translations/

Create missing language fr@formal in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Create missing language de@informal in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Create missing language de@formal in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Create missing language it@formal in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Create missing language pl@formal in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Create missing language nl@informal in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Create missing language nl@formal in wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version...
Created missing languages

✅ This is successful.

4.) We upload the main POT file with POST /api/translations/(string: project)/(string: component)/(string: language)/file

Uploaded new source file: {
  not_found: 0,
  skipped: 0,
  accepted: 74,
  total: 74,
  result: true,
  count: 74
}

✅ This is successful.

5.) We auto translate the created languages with machine translation with POST /api/translations/(string: project)/(string: component)/(string: language)/autotranslate/

Auto translate fix-set-default-api-doc-version "fr@formal" (mode = translate, filter_type = todo, auto_source = mt, engines = deepl, component = , threshold = 90)...
Endpoint request failed: translations/wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal/autotranslate/
Locked. Retrying in 5 seconds... (Attempt 1/5)
Endpoint request failed: translations/wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal/autotranslate/
Locked. Retrying in 5 seconds... (Attempt 2/5)
Endpoint request failed: translations/wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal/autotranslate/
Locked. Retrying in 5 seconds... (Attempt 3/5)
Endpoint request failed: translations/wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal/autotranslate/
Locked. Retrying in 5 seconds... (Attempt 4/5)
Endpoint request failed: translations/wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal/autotranslate/
Locked. Retrying in 5 seconds... (Attempt 5/5)
Endpoint request failed: translations/wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal/autotranslate/
Max retries (5) reached. Giving up.

❌ The response is a 423 Locked error. All the requests to the REST API are sequential and never run in parallel. In our tests we also made sure that the component is not locked by user access.

But, I found this ticket: #4666, especially the comment #4666 (comment). Relevant code:

weblate/weblate/vcs/apps.py

Lines 104 to 116 in f595bb9

def post_migrate(self, sender: AppConfig, **kwargs) -> None:
ensure_ssh_key()
home = data_dir("home")
if not os.path.exists(home):
os.makedirs(home)
# Configure merge driver for Gettext PO
# We need to do this behind lock to avoid errors when servers
# start in parallel
lockfile = WeblateLock(
home, "gitlock", 0, "", "lock:{scope}", "{scope}", timeout=120
)

Is there a chance to find out what caused the lock? Why is it 120 seconds?

I already tried

  • I've read and searched the documentation.
  • I've searched for similar filed issues in this repository.

Steps to reproduce the behavior

See above.

Expected behavior

No lock error

Screenshots

No response

Exception traceback

No response

How do you run Weblate?

Docker container

Weblate versions

No response

Weblate deploy checks

No response

Additional context

No response

@nijel
Copy link
Member

nijel commented Dec 19, 2024

Weblate locks internally the component for some operations, for example to avoid concurrent manipulation with the files. Retrying later should work. You should be able to see what is going on in the server logs.

@matzeeable
Copy link
Author

matzeeable commented Dec 20, 2024

Command to get the log:

sudo docker logs 3dd8d9fc44a1 --since "2024-12-19T10:35:27+01:00" --until "2024-12-19T10:50:27+01:00" \
    | grep -vE "received$|this revision has been already parsed, skipping update$" \
    | grep -v "wordpress-real-cookie-banner-frontend-javascript" \
    | grep -v "wordpress-real-cookie-banner-wordpressorg-readme" \
    | grep -v "wordpress-real-media-library" \
    | grep -v "wordpress-real-physical-media" \
    | grep -v "wordpress-real-thumbnail-generator" \
    | grep -v "wordpress-real-category-management" \
    | grep -v "devowl-wp-utils" \
    | grep 'wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal: starting automatic translation None: mt: deepl' -A 2000 --color -m 1

As you can see, there are other components (e.g. wordpress-real-media-library) updated concurrently, but this should not lead to any issues as the lock is at component-level, I guess.

This is our server log from the first /autotranslate request:

gunicorn stderr | [2024-12-19 10:38:42,950: INFO/20468] wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal: starting automatic translation None: mt: deepl
gunicorn stderr | INFO:weblate:wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal: starting automatic translation None: mt: deepl
nginx stdout | 127.0.0.1 - - [19/Dec/2024:10:38:47 +0100] "GET /healthz/ HTTP/1.1" 200 12 "-" "curl/7.88.1"
nginx stdout | 152.53.135.192 - - [19/Dec/2024:10:38:47 +0100] "POST /api/translations/wordpress-real-cookie-banner-backend-php/fix-set-default-api-doc-version/fr@formal/autotranslate/ HTTP/1.1" 423 77 "-" "axios/1.7.2"

But from previous calls (I do not know from which request, but I think when uploading the new source file) I can see some locks with Acquired Lock('lock:lock:repo:10052'). and Acquired Lock('lock:lock:repo:non').. Could they cause the issue?

Is there a chance to get the timeout of the lock with the Retry-After header? In terms of Continuous Localization, I think this would make sense, so our CI does not work x retries, instead just waits until the lock is free.

Another question: In the access logs, I see that some crawlers access the edit page of strings. Does this also lead to a lock?

For reference, I have found the following locks which have more than 5 seconds timeout:

self.lock = WeblateLock(
lock_path=os.path.dirname(base_path),
scope="repo",
key=component.pk if component else os.path.basename(base_path),
slug=os.path.basename(base_path),
file_template="{slug}.lock",
timeout=120,
)

return WeblateLock(
backup_dir, "backuplock", 0, "", "lock:{scope}", ".{scope}", timeout=120
)

weblate/weblate/vcs/apps.py

Lines 114 to 116 in f595bb9

lockfile = WeblateLock(
home, "gitlock", 0, "", "lock:{scope}", "{scope}", timeout=120
)

@nijel
Copy link
Member

nijel commented Dec 20, 2024

The timeout applies when acquiring the lock and waiting while other process holds the lock.

The locking happens on component or repository level, so when components share a single repository they will wait for a single lock when Weblate is working with the repository.

@matzeeable
Copy link
Author

Ok, we have this shared repository: https://translate.owlinfra.de/projects/shared-glossaries/real-cookie-banner/:

image

So, in this case, when e.g. WordPress Real Cookie Banner (Backend, PHP) and WordPress Real Cookie Banner (Frontend, JavaScript) call the /autotranslate API endpoint concurrently, they could lock each other?

@nijel
Copy link
Member

nijel commented Dec 27, 2024

Yes.

@matzeeable
Copy link
Author

This is only a glossary and is used for the DeepL support for glossaries (#10519). As this glossary is not affected by the e.g. /autotranslate route, I do not understand exactly why this is locked. In general, would it be possible to not lock glossaries at all?

This comment was marked as off-topic.

@github-actions github-actions bot added the wontfix Nobody will work on this. label Jan 12, 2025
@matzeeable

This comment was marked as off-topic.

@github-actions github-actions bot removed the wontfix Nobody will work on this. label Jan 13, 2025
@nijel
Copy link
Member

nijel commented Jan 14, 2025

As this glossary is not affected by the e.g. /autotranslate route, I do not understand exactly why this is locked.

I'm confused now, you get the locking when calling autotranslate, so how is the glossary not affected by it?

In general, would it be possible to not lock glossaries at all?

Locking is necessary to avoid concurrent operations on the underlying repository. For database operations, we're slowly progressing towards row level locking, but we're not yet fully there for some code paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants