[Bug] Dataproc (Python models) do not retry polling on 5xx errors #557
Labels
feature:python-models
Issues related to python models
pkg:dbt-bigquery
Issue affects dbt-bigquery
type:enhancement
New feature request
Is this a new bug in dbt-bigquery?
Current Behavior
After dbt-bigquery submits a dataproc batch job, it enters a polling mode, waiting for a response to indicate the job completed (dbt/adapters/bigquery/dataproc/batch.py#29). This polling process is not retrying transient errors, so the dbt run ends up failing with an error, while the actual dataproc job runs to completion successfully.
Expected Behavior
We should be performing the dataproc batch polling with a retry strategy to retry transient errors.
The BatchControllerClient.get_batch() function takes a retry parameter:
so we should pass one in!
Steps To Reproduce
This relies on GCP API throwing a 5xx error so it's not very reproducible, though we are hitting the error at least once a day in our runs.
Relevant log output
Environment
Additional Context
Feels similar to dbt-labs/dbt-bigquery#682, but I believe the Dataproc element means we're dealing with a separate code path that doesn't benefit from
RETRYABLE_ERRORS
. The stacktrace also doesn't mentiondbt/adapters/bigquery/connections.py
.The text was updated successfully, but these errors were encountered: