Skip to content

Workbench Reporting Dataset (WRD)

Joel Thibault edited this page Sep 23, 2024 · 14 revisions

Workbench Reporting Dataset (WRD)

The Workbench Reporting Dataset (WRD) is a BigQuery dataset which provides a partial periodic snapshot of the Workbench database to downstream consumers for reporting purposes. The primary consumer of this dataset is the PDR (Program Data Repository), which feeds into analytics/dashboards.

Key links

Change management

Development

Making a change to the reporting dataset may involve interacting with two logical components:

  1. The API server code, generally must be touched for all reporting changes
  2. The live BigQuery schemas, managed via Terraform, must be modified if there are any changes to the schema

Note that modifying the Terraform schema typically requires two PRs: one to update the schema/module definitions, and another to pull in the new schemas (and apply them) in all environments.

Example PRs:

Testing

  • (if any schema updates needed) apply your schema changes to the reporting_local BigQuery dataset
  • Get a local API server running
    • Make any necessary changes to the reporting server code
    • Run the local API server: ./project.rb dev-up
    • Ensure you have data locally - the easiest way to create this is by connecting a local UI and creating users/workspaces
  • Invoke the reporting cron locally by invoking this shell script / curl
  • Verify that the new data appears in the reporting_local BigQuery dataset

Deployment

  • Send PRs for the server code change
  • If BigQuery schema changes are needed (i.e. Terraform changes), follow the process for applying changes in Terraform and apply this to all environments
  • Merge the server changes

PDR coordination

After finishing the above work, the following situations will need coordination with the PDR:

  • New table was added to the schema
  • Synchronized table was removed (see RW-6696 for a pointer on what's currently synchronized)

If PDR coordination is needed, file a request here.

We can view the PDR dataset here: https://console.cloud.google.com/bigquery?project=all-of-us-rw-prod&d=rw_ops_data&p=aou-pdr-data-prod&page=dataset

Note: addition of new columns or views do not require coordination