Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshotting silently fails on values greater than int max value when source dtype changes to bigint #759

Open
jelstongreen opened this issue Aug 7, 2024 · 3 comments
Labels
bug Something isn't working help wanted Looking for community help to implement

Comments

@jelstongreen
Copy link

Describe the bug

We noticed one table that we snapshot that when the source data changes a col from int to bigint but the downstream snapshot is not amended that the snapshot will just continuously insert the old record whenever the snapshot is run and not error that the new value cannot be inserted.

Steps To Reproduce

Snapshot a source model with a regular int column.
Change the source model column to bigint and change the value in the row to a bigint value larger than max int.
Run the dbt snapshot on the source table. The snapshot should have the check strategy and have the column that's been amended in that check list.

Expected behavior

A clear and concise description of what you expected to happen.
If the target dtype is int for a bigint value then there should be an error rather than a silent failure.

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

The output of dbt --version:

<output goes here>

The operating system you're using:

The output of python --version:

Additional context

Add any other context about the problem here.

@jelstongreen jelstongreen added the bug Something isn't working label Aug 7, 2024
@benc-db benc-db added the help wanted Looking for community help to implement label Aug 8, 2024
@henlue
Copy link
Contributor

henlue commented Oct 29, 2024

I would have time to work on this.

It looks like currently the merge statement that is executed will always try to cast the datatypes. Even if it fails to cast the data (for example string to int) it will insert null and there will be no error from the databricks side.

I would propose to check for schema changes in dbt, by using the check_for_schema_changes macro from dbt core. It is already being used to check for schema changes in incremental models.

@benc-db should this check just happen, or would it make sense to make it configurable? Similar to incremental models there could be an on_schema_change config option. It could be "ignore" by default to keep the current behavior, and only if it is set to "fail" check_for_schema_changes would be executed.

@benc-db
Copy link
Collaborator

benc-db commented Oct 29, 2024

@mikealfare Would love to hear your thoughts on @henlue's comment.

@mikealfare
Copy link
Contributor

Without thinking about it too much, adding on_schema_change to snapshots sounds like a good idea. We're in the process of updating snapshots to add more features in general. I would post some comments on this issue. It collects all of the snapshot work being done. I don't know if this particular feature is included, but there are certainly schema-related items on there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Looking for community help to implement
Projects
None yet
Development

No branches or pull requests

4 participants