-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
retry wait for result independently from job creation #1042
Conversation
@@ -787,7 +789,7 @@ def reopen_conn_on_error(error): | |||
target=fn, | |||
predicate=_ErrorCounter(self.get_job_retries(conn)).count_error, | |||
sleep_generator=self._retry_generator(), | |||
deadline=self.get_job_retry_deadline_seconds(conn), | |||
timeout=self.get_job_retry_deadline_seconds(conn), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's just that the previous field is deprecated in the driver, it's the same behavior
36051fb
to
ccb141d
Compare
@github-christophe-oudar wanted to point you to this open PR I have as I think these are related #977 |
@McKnight-42 thank for pointing it out 👍 |
@github-christophe-oudar The ticket is still being worked on. I did have to set it aside for a bit due to some other work and traveling but it's on my board. I plan to set some time to dig into these pr's and problems a little more over the next few days and will keep you updated. |
Ok, great to know! |
This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days. |
Although we are closing this PR as stale, it can still be reopened to continue development. Just add a comment to notify the maintainers. |
resolves #1045
Problem
When an error occurs either on job creation or waiting for the result, the job creation + wait result step is retried.
Then the underlying wait for result step might "fail" (as it's polling for the result every X seconds) and a network error... can lead to retry the whole job.
If the job isn't idempotent => it leads to a bug (what happened for a coworker).
if the job is idempotent => you likely wasted slot time/BQ resources.
Solution
To solve that, let's split the step in 2 functions that are both retried on their own so that we retry accessing the running job.
Checklist