Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-1027] [CT-3367] [Feature] BigQuery Foreign / Primary Key Support #1016

Closed
3 tasks done
ckchow opened this issue Nov 9, 2023 · 6 comments
Closed
3 tasks done
Labels
enhancement New feature or request wontfix This will not be worked on

Comments

@ckchow
Copy link

ckchow commented Nov 9, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

BigQuery recently added the ability to define (unenforced!) PK/FK relationships between tables. https://cloud.google.com/blog/products/data-analytics/join-optimizations-with-bigquery-primary-and-foreign-keys

dbt should be able to manage the expression of these constraints on BigQuery models so that the BigQuery analyzer can take advantage of optimizations that these relations enable.

Describe alternatives you've considered

It's possible to manually insert these references manually, but they will be eliminated every time the warehouse update runs.

ALTER table inventory ADD primary key(inv_date_sk, inv_item_sk, inv_warehouse_sk) NOT ENFORCED,
ADD FOREIGN KEY(inv_date_sk) references date_dim(d_date_sk) NOT ENFORCED,
ADD FOREIGN KEY(inv_item_sk) references item(i_item_sk) NOT ENFORCED,
ADD FOREIGN KEY(inv_warehouse_sk) references warehouse(w_warehouse_sk) NOT ENFORCED;

Who will this benefit?

This will significantly benefit most BigQuery users of dbt. Dbt by default generates view models, and the key constraints will allow the analyzer to perform optimizations on complex queries that are executed by joining multiple views that are themselves joins.

Are you interested in contributing this feature?

I could help, I'm not really versed in how the dbt codebase is organized however.

Anything else?

Bigquery documentation for key DDL: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#alter_table_add_primary_key_statement

@ckchow ckchow added enhancement New feature or request triage labels Nov 9, 2023
@github-actions github-actions bot changed the title [Feature] BigQuery Foreign / Primary Key Support [CT-3367] [Feature] BigQuery Foreign / Primary Key Support Nov 9, 2023
@dbeatty10 dbeatty10 transferred this issue from dbt-labs/dbt-core Nov 10, 2023
@github-actions github-actions bot changed the title [CT-3367] [Feature] BigQuery Foreign / Primary Key Support [ADAP-1027] [CT-3367] [Feature] BigQuery Foreign / Primary Key Support Nov 10, 2023
@dbeatty10
Copy link
Contributor

Thanks for reaching out @ckchow !

In dbt v1.5, we added the ability to define constraints like primary_key and foreign_key (even if they aren't enforced).

Wanna check it out and see if it meets the needs that you are intending?

Details

You can read more here:

The general form of a primary key constraint looks like the following.

models:
  - name: <model_name>
    
    # required
    config:
      contract:
        enforced: true
    
    # model-level constraints
    constraints:
      - type: primary_key
        columns: [<first_column>, <second_column>, ...]

@dbeatty10 dbeatty10 added enhancement New feature or request awaiting_response and removed enhancement New feature or request triage labels Nov 16, 2023
@elyobo
Copy link

elyobo commented Dec 19, 2023

It's great that this seems to be supported but the requirement to turn on full contract enforcement to enable adding constraints is a pain point, as that comes with a much heavier set of requirements. May be eased if/when partial contracts turn up.

When enforced, your contract must include every column's name and data_type (where data_type matches one that your data platform understands).

@mitalauskas
Copy link

mitalauskas commented Dec 28, 2023

Also, with constraints, there's no possibility to define multiple primary keys.
BigQuery allows this with the ALTER TABLE ADD PRIMARY KEY statement.
However, when using constraints the schema is defined like so:

create or replace table xyz
  id NUMERIC not null primary key not enforced,
  api_token STRING,
  email STRING,
  name STRING,
  created_at DATETIME,
  updated_at DATETIME,
  password STRING,
  valid_from DATETIME,
  valid_to DATETIME,
  is_latest BOOLEAN,
  partition_date DATE

And if you'd try to add PK on an additional column, for example, the valid_from, it would error to:
Database Error in model xyz.sql Multiple PRIMARY KEY definitions found in CREATE TABLE at [16:5]

@elyobo
Copy link

elyobo commented Jan 4, 2024

@mitalauskas how are you defining it? I noticed that the output was different defining model level constraints vs column level constraints, does it work if you define your multiple PKs in the model level constraints rather than column level constraints on each of the PKs?

@dbeatty10
Copy link
Contributor

the requirement to turn on full contract enforcement to enable adding constraints is a pain point, as that comes with a much heavier set of requirements

@elyobo agreed that it is definitely is a heavier set of requirements and is a pain point if you just want to manage adding primary keys or foreign keys.

But we're not planning on adding constraints separately at this time, so I'm going to close this as "not planned".

@mitalauskas you can configure a compound primary key like this by specifying it at the model level (rather than the column level):

models:
  - name: my_model
    
    config:
      contract:
        enforced: true
    
    # model-level constraints
    constraints:
      - type: primary_key
        columns: [pk1, pk2]

    columns:
      - name: pk1
        data_type: int64
      - name: pk2
        data_type: string

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 7, 2024
@dbeatty10 dbeatty10 added wontfix This will not be worked on and removed triage labels Feb 7, 2024
@elyobo
Copy link

elyobo commented Feb 7, 2024

Thanks for the update @dbeatty10. For those wanting the constraints for other reasons (e.g. join optimisations on BigQuery), I haven't tried but assume it could be wired up in post hooks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants