Skip to content

Commit

Permalink
Expanded readme (#76)
Browse files Browse the repository at this point in the history
* Update README.md

* Update README.md
  • Loading branch information
ArthurKordes authored Dec 10, 2024
1 parent dc019a1 commit 7f0d07e
Showing 1 changed file with 16 additions and 0 deletions.
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,22 @@ run_validation(
See the documentation of `dq_suite.validation.run` for what other parameters can be passed.


# Other functionalities
## Export the schema from Unity Catalog to the Input Form
In order to output the schema from Unity Catalog, use the following commands (using the required schema name):
```
schema_output = dq_suite.schema_to_json_string('schema_name', spark, *table)
print(schema_output)
```
Copy the string to the Input Form to quickly ingest the schema in Excel. The "table" parameter is optional, it gives more granular results.
## Validate the schema of a table
It is possible to validate the schema of an entire table to a schema definition from Amsterdam Schema in one go. This is done by adding two fields to the "dq_rules" JSON when describing the table (See: https://github.com/Amsterdam/dq-suite-amsterdam/blob/main/dq_rules_example.json).
You will need:
- validate_table_schema: the id field of the table from Amsterdam Schema
- validate_table_schema_url: the url of the table or dataset from Amsterdam Schema
The schema definition is converted into column level expectations (ExpectColumnValuesToBeOfType) on run time.


# Known exceptions / issues
- The functions can run on Databricks using a Personal Compute Cluster or using a Job Cluster.
Using a Shared Compute Cluster will result in an error, as it does not have the permissions that Great Expectations requires.
Expand Down

0 comments on commit 7f0d07e

Please sign in to comment.