-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for non-incremental materialized views #1255
base: main
Are you sure you want to change the base?
Conversation
…n_incremental_mat_view
Closing and re-opening to re-trigger CI post GH outage. |
…ly changed options
…dbt-bigquery into non_incremental_mat_view
dbt/adapters/bigquery/relation.py
Outdated
config_change_collection.options = BigQueryOptionsConfigChange( | ||
action=RelationConfigChangeAction.alter, | ||
context=new_materialized_view.options, | ||
# get an options change object with only the options that have changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As opposed to what. Pardon, just not sure what this comment refers to with the prexisting code. Were we instantiating the object in a different way? I see you're no longer using kwargs which is fine, but what's the intent here?
edit: Oh I think I get it now based on this below https://github.com/dbt-labs/dbt-bigquery/pull/1255/files#r1639268560. I really like the from
style creation mechanism -- reminds me of Rust trait From
, but I think we might alter how you've implemented it slightly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were creating an options object with all options, not just those that changed, and it was creating issues when altering the options. For example, options that force a full refresh would always show up in the change even if they didn't change, hence changing an option that could be implemented via an ALTER
statement would still trigger a full refresh due to the presence of the other options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to indicate that better in the comment. I don't know how it would be good to say that but something like "This is how is used to be done. This is why we were are doing it this way deliberately. This resulted in X. We want to avoid X."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After going around the block on this a few times today, I'm not adopting the approach of simply submitting the entire new options config. This comment no longer applies. However there are new comments for the new scenario. I'm looking for feedback on those comments as a result of this thread.
@@ -22,6 +23,7 @@ class BigQueryOptionsConfig(BigQueryBaseRelationConfig): | |||
refresh_interval_minutes: Optional[float] = 30 | |||
expiration_timestamp: Optional[datetime] = None | |||
max_staleness: Optional[str] = None | |||
allow_non_incremental_definition: Optional[bool] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment to above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This continues the comment above regarding defaulting Optional[bool]
s to None
in BigQueryConfig
. The intent in this section is to match the defaulting behavior of BigQuery according to their docs. The conclusion that we reached above is that we should have a comment regarding the fact that BQ effectively allows a third falsy value (None
) for some booleans (allow_non_incremental_definition
). There is a link in the docstring to those configurations. While I don't want to pull too much information from that link (to avoid getting out of sync with BQ docs), I will add a generic comment to the top that indicates that some booleans are option in BQ in the sense that there is literally no setting if it's not provided (versus defaulting to false). Does that work?
if new_options_dict[k] != existing_options_dict[k] | ||
} | ||
option_diffs = BigQueryOptionsConfig.from_dict(option_diffs_dict) | ||
return cls(action=RelationConfigChangeAction.alter, context=option_diffs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think what you were doing ahead now. I'm not sure I like burrying the alter here. May be a matter of preference. I kind of want to float this action param up for visibility when creating the object as opposed to hardcoding the value down here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand, but I'll comment with what's coming to mind.
Most things are an alter
in this framework. The other options are create
and drop
, which really was only used for indexes in dbt-postgres
where the entire object (index, not the relation) would be created or dropped instead of altered. To alter an index in dbt-postgres
you need to issue two statements, one to drop the existing, and another to recreate it with the new config. In this case, we're really just altering the options on the relation (in this case an MV, but could be a table in the future). Sometimes that happens via a full refresh, which gets determined based on the context.
I agree we could spend more time on this framework to make it easier to parse (and I would like to do so if given capacity), but updating that would be outside of the scope of this particular change. Hence I left it with the default alter
since I need to provide something. Unfortunately, if you follow the code through, this attribute never gets used, only context
does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree on not scoping up. I'd just rather have an empty param here and pass in the alter
deliberately at the function call level above. Right now you're making an assumption that I worry may not be obvious to the next person. But with the generic param passing this argument from the function call, it would be quite straightforward for adding others as needed (even if it never happens, I think it'd read clearer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say to pass it in to the function call above, are you saying to pass it in to from_options_configs
? While it's static now, it may not be in the future. In fact, I could see accounting for "unsetting" options in the future, would would just be a drop. Then I would need to compare the two options configs to see if it's an alter or a drop/create scenario. That would make from_options_configs
duplicative. My intent for this method was basically "here are my two options configs, please tell me what I need to do to get from A to B". I don't think supplying the action as an argument as well makes it more readable. I could set the action prior to the return with a comment that all diffs are alter diffs at this point, which may change in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the change of providing the entire options config as context for the change, there is no need for a from_options_config
method. Hence this is now back in the calling method. I think this is what you were asking for, but please let me know if that's incorrect.
@@ -63,6 +65,9 @@ def change_config_via_replace(project, materialized_view): | |||
initial_model.replace(old_partition, new_partition) | |||
.replace("'my_base_table'", "'my_other_base_table'") | |||
.replace('cluster_by=["id", "value"]', 'cluster_by="id"') | |||
.replace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also test going from allow_non_incremental_definition=False to allow_non_incremental_definition=True
@mikealfare What should we do in the rest? If there is anything I can help, please let me know. |
resolves #672
This PR continues @bnaul's work in #1011.
This PR has become more complicated than expected and will require some discussion prior to moving forward. There are a number of items that need to be addressed:
Checklist