-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2343] [Bug] snapshot with custom datatypes breaks when source table is regenerated #678
Comments
Thanks for opening this @Bonnevie ! I made an attempt at reproducing this, but wasn't able to trigger the same error as you. Could you tweak the simple example below to reproduce what you were seeing? I am using a local postgres database as described here so you'll need to tweak your database, schema, etc as-needed. dbt project
{% snapshot my_snapshot %}
{{
config(
target_database='postgres',
target_schema='snapshots',
strategy='check',
check_cols='all',
unique_key='name'
)
}}
select * from {{ source("dbt_dbeatty", "person") }}
{% endsnapshot %} Steps
Using postgres IDE of choice: CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
CREATE TABLE dbt_dbeatty.person (
name text,
current_mood mood
);
INSERT INTO dbt_dbeatty.person VALUES ('Moe', 'happy');
dbt build
Using postgres IDE of choice again: truncate TABLE dbt_dbeatty.person;
INSERT INTO dbt_dbeatty.person VALUES ('Curly', 'ok');
dbt build |
I'm also experiencing this issue. The macro postgres__get_columns_in_relation, which is called when the snapshot checks for any column changes, selects As a temporary fix, I have overridden postgres__get_columns_in_relation in override_default_macros.sql to resolve udt names along with their schema:
|
After attempting again with the following steps, I was able to reproduce the reported issue. Also confirmed that the solution from @benorourke worked 🏅 Steps
Using postgres IDE of choice: CREATE TABLE dbt_dbeatty.person (
name text
);
INSERT INTO dbt_dbeatty.person VALUES ('Moe');
dbt snapshot
Build source table with custom enum type in postgres (using example here). Using postgres IDE of choice again: CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');
-- Yeah, a simple ALTER TABLE ... would be better here! But this is sufficient to reproduce
DROP TABLE dbt_dbeatty.person;
CREATE TABLE dbt_dbeatty.person (
name text,
current_mood mood
);
INSERT INTO dbt_dbeatty.person VALUES ('Curly', 'ok');
dbt snapshot 💥
Copy-paste the solution described here (untested) or here into
dbt snapshot ✅ |
I have a similar bug, which relates to this: Environment: |
Here is an alternative override for {% macro postgres__get_columns_in_relation(relation) -%}
{% call statement('get_columns_in_relation', fetch_result=True) %}
select
col.attname as column_name,
-- Two options for formatting the data type:
-- 1) do not include the type modifier (e.g., `geometry`)
-- 2) include the type modifier when appliable (e.g., `geometry(Point,4326)`)
-- Option 1:
pg_catalog.format_type(col.atttypid, null) as data_type,
-- Option 2:
-- pg_catalog.format_type(col.atttypid, col.atttypmod) as data_type,
null as character_maximum_length,
null as numeric_precision,
null as numeric_scale
from pg_catalog.pg_namespace sch
join pg_catalog.pg_class tbl on tbl.relnamespace = sch.oid
join pg_catalog.pg_attribute col on col.attrelid = tbl.oid
left outer join pg_catalog.pg_description tbl_desc on (tbl_desc.objoid = tbl.oid and tbl_desc.objsubid = 0)
left outer join pg_catalog.pg_description col_desc on (col_desc.objoid = tbl.oid and col_desc.objsubid = col.attnum)
where 1=1
and not pg_is_other_temp_schema(sch.oid) -- not a temporary schema belonging to another session
and tbl.relpersistence in ('p', 'u') -- [p]ermanent table or [u]nlogged table. Exclude [t]emporary tables
and tbl.relkind in ('r', 'v', 'f', 'p') -- o[r]dinary table, [v]iew, [f]oreign table, [p]artitioned table. Other values are [i]ndex, [S]equence, [c]omposite type, [t]OAST table, [m]aterialized view
and col.attnum > 0 -- negative numbers are used for system columns such as oid
and not col.attisdropped -- column has not been dropped
and tbl.relname = '{{ relation.identifier }}'
{% if relation.schema %}
and sch.nspname = '{{ relation.schema }}'
{% endif %}
order by
sch.nspname,
tbl.relname,
col.attnum
{% endcall %}
{% set table = load_result('get_columns_in_relation').table %}
{{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %} |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
This issue also exists in unit tests. The workaround provided by @dbeatty10 gets us past the |
Is this a new bug in dbt-core?
Current Behavior
I've set-up a snapshot table in a postgres database, and it has a column named
annotation_type
that uses a custom enum datatype. This has worked without problems for weeks, but after the table had to be truncated and restored due to a faulty replica setup we encountered the following error when trying t:I cannot find the sql file that the error is in reference to.
the snapshot table was also suddenly lacking the
annotation_type
column, despite snapshots supposedly never deleting columns?We are at a development stage where we can still afford to do a full restart of the snapshot, but we'd rather not encounter this in the future.
Our hypothesis is that the custom type is somehow not matched appropriately following regeneration?
Expected Behavior
I would expect snapshots to be robust to no-ops like this, and at least to never incur data loss.
If custom types are unsupported/problematic, it'd be nice to be warned.
Steps To Reproduce
Not easily reproduced, but most likely course:
Relevant log output
No response
Environment
No response
Which database adapter are you using with dbt?
postgres
Additional Context
No response
The text was updated successfully, but these errors were encountered: