Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on 503 #1408

Merged
merged 11 commits into from
Dec 17, 2024
Merged

Retry on 503 #1408

merged 11 commits into from
Dec 17, 2024

Conversation

mikealfare
Copy link
Contributor

@mikealfare mikealfare commented Nov 20, 2024

resolves #682

Problem

We were missing some retry scenarios that BigQuery added over time. We also were not retrying all client factories.

Solution

  • add retries to all client factories
  • instead of replacing the building retryable errors, extend them with our additions

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@mikealfare mikealfare self-assigned this Nov 20, 2024
@cla-bot cla-bot bot added the cla:yes label Nov 20, 2024
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-bigquery contributing guide.

1 similar comment
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-bigquery contributing guide.

Copy link
Contributor Author

@mikealfare mikealfare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a self-review for context. Also, given this whole PR is retrying transient errors, what is expected from a testing effort? We already have some unit tests that verify that we retry errors as expected, but they don't test them in the context of other calls. e.g. we don't verify that if we try to load a file into BQ and we get a 503, that we in fact retry. Would testing this be going too far?

dbt/adapters/bigquery/clients.py Outdated Show resolved Hide resolved
from google.api_core.future.polling import DEFAULT_POLLING
from google.api_core.retry import Retry
from google.cloud.bigquery.retry import DEFAULT_RETRY
from google.cloud.exceptions import BadGateway, BadRequest, ServerError
from google.cloud.bigquery.retry import DEFAULT_JOB_RETRY
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we're retrying jobs, we should use this default instead of DEFAULT_RETRY. We still use DEFAULT_RETRY for the client factories where it makes more sense.

deadline=self._job_deadline,
on_error=_create_reopen_on_error(connection),

retry = DEFAULT_JOB_RETRY.with_delay(maximum=3.0).with_predicate(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were always attaching a deadline here, which could be None if the user did not set one. I believe this would cause the retry to go on for quite a while, depending on what job_retries is set to. We should instead inherit the default from DEFAULT_JOB_RETRY.

minimum=1.0 is the default.

The default for maximum is 60. Should we adjust this?

@mikealfare mikealfare marked this pull request as ready for review November 20, 2024 21:31
@mikealfare mikealfare requested a review from a team as a code owner November 20, 2024 21:31
@mikealfare
Copy link
Contributor Author

We should wait until #1431 is merged and rebase this on that. They address similar things and that PR covers a lot of what was initially implemented here.

@mikealfare mikealfare marked this pull request as draft December 12, 2024 16:31
@mikealfare
Copy link
Contributor Author

Marking this as a draft until #1431 is merged.

@mikealfare mikealfare marked this pull request as ready for review December 16, 2024 17:34
@mikealfare mikealfare enabled auto-merge (squash) December 16, 2024 17:35
@mikealfare mikealfare merged commit a219818 into main Dec 17, 2024
15 checks passed
@mikealfare mikealfare deleted the retry-on-503 branch December 17, 2024 22:19
Copy link
Contributor

github-actions bot commented Jan 9, 2025

The backport to 1.9.latest failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.9.latest 1.9.latest
# Navigate to the new working tree
cd .worktrees/backport-1.9.latest
# Create a new branch
git switch --create backport-1408-to-1.9.latest
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a219818c5a38339568bfb4e561405cfe8f6732eb
# Push it to GitHub
git push --set-upstream origin backport-1408-to-1.9.latest
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.9.latest

Then, create a pull request where the base branch is 1.9.latest and the compare/head branch is backport-1408-to-1.9.latest.

1 similar comment
Copy link
Contributor

github-actions bot commented Jan 9, 2025

The backport to 1.9.latest failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.9.latest 1.9.latest
# Navigate to the new working tree
cd .worktrees/backport-1.9.latest
# Create a new branch
git switch --create backport-1408-to-1.9.latest
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a219818c5a38339568bfb4e561405cfe8f6732eb
# Push it to GitHub
git push --set-upstream origin backport-1408-to-1.9.latest
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.9.latest

Then, create a pull request where the base branch is 1.9.latest and the compare/head branch is backport-1408-to-1.9.latest.

colin-rogers-dbt pushed a commit that referenced this pull request Jan 9, 2025
* add default retry on all client factories, which includes 502 and 503 errors
* update retries to use defaults and ensure that a timeout or deadline is set

(cherry picked from commit a219818)
@colin-rogers-dbt colin-rogers-dbt mentioned this pull request Jan 9, 2025
4 tasks
colin-rogers-dbt added a commit that referenced this pull request Jan 10, 2025
* Retry on 503 (#1408)

* add default retry on all client factories, which includes 502 and 503 errors
* update retries to use defaults and ensure that a timeout or deadline is set

(cherry picked from commit a219818)

* remove hatch.toml

---------

Co-authored-by: Mike Alfare <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ADAP-498] [Bug] BQ does not retry on 503
2 participants