-
Notifications
You must be signed in to change notification settings - Fork 10
Workbench Reporting Dataset (WRD)
The Workbench Reporting Dataset (WRD) is a BigQuery dataset which provides a partial periodic snapshot of the Workbench database to downstream consumers for reporting purposes. The primary consumer of this dataset is the PDR (Program Data Repository), which feeds into analytics/dashboards.
- Original reporting design
- Reporting terraform modules
- (partially defunct - RW-5695) reporting bootstrapping codegen
Making a change to the reporting dataset may involve interacting with two logical components:
- The API server code, generally must be touched for all reporting changes
- The live BigQuery schemas, managed via Terraform, must be modified if there are any changes to the schema
Note that modifying the Terraform schema typically requires two PRs: one to update the schema/module definitions, and another to pull in the new schemas (and apply them) in all environments.
Example PRs:
- Adding a new table: Terraform modules, Terraform version bump, API server
- Adding a new column: Terraform modules, Terraform version bump, API server
- Removing a column ... is not possible without recreating the table, losing all existing data. We probably don't want to do this.
- (if any schema updates needed) apply your schema changes to the
reporting_local
BigQuery dataset- Make any schema changes in the reporting Terraform modules
- Push up a branch with these changes
- Locally in workbench-devops, temporarily change the module reference to point to your terraform-modules branch
- e.g. change
?ref=v0.1.4
->?ref=my/branch-123
- e.g. change
- Apply the terraform change to the local environment (instructions)
- The reporting_local dataset in the test environment should now reflect your schema changes
- Get a local API server running
- Make any necessary changes to the reporting server code
- Run the local API server:
./project.rb dev-up
- Ensure you have data locally - the easiest way to create this is by connecting a local UI and creating users/workspaces
- Invoke the reporting cron locally by invoking this shell script / curl
- Verify that the new data appears in the
reporting_local
BigQuery dataset
- Send PRs for the server code change
- If BigQuery schema changes are needed (i.e. Terraform changes), follow the process for applying changes in Terraform and apply this to all environments
- Merge the server changes
After finishing the above work, the following situations will need coordination with the PDR:
- New table was added to the schema
- Synchronized table was removed (see RW-6696 for a pointer on what's currently synchronized)
If PDR coordination is needed, file a request here.
We can view the PDR dataset here: https://console.cloud.google.com/bigquery?project=all-of-us-rw-prod&d=rw_ops_data&p=aou-pdr-data-prod&page=dataset
Note: addition of new columns or views do not require coordination