Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expanded readme #76

Merged
merged 2 commits into from
Dec 10, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,22 @@ run_validation(
See the documentation of `dq_suite.validation.run` for what other parameters can be passed.


# Other functionalities
## Export the schema from Unity Catalog to the Input Form
In order to output the schema from Unity Catalog, use the following commands (using the required schema name):
```
schema_output = dq_suite.schema_to_json_string('schema_name', spark, *table)
print(schema_output)
```
Copy the string to the Input Form to quickly ingest the schema in Excel. The "table" parameter is optional, it gives more granular results.
## Validate the schema of a table
It is possible to validate the schema of an entire table to a schema definition from Amsterdam Schema in one go. This is done by adding two fields to the "dq_rules" JSON when describing the table (See: https://github.com/Amsterdam/dq-suite-amsterdam/blob/main/dq_rules_example.json).
You will need:
- validate_table_schema: the id field of the table from Amsterdam Schema
- validate_table_schema_url: the url of the table or dataset from Amsterdam Schema
The schema definition is converted into column level expectations (ExpectColumnValuesToBeOfType) on run time.


# Known exceptions / issues
- The functions can run on Databricks using a Personal Compute Cluster or using a Job Cluster.
Using a Shared Compute Cluster will result in an error, as it does not have the permissions that Great Expectations requires.
Expand Down
Loading