Releases: great-expectations/great_expectations
0.9.5
0.9.4
- Update CLI
init
flow to support snowflake transient tables - Use filename for default expectation suite name in CLI
init
- Tables created by SqlAlchemyDataset use a shorter name with 8 hex characters of randomness instead of a full uuid
- Better error message when config substitution variable is missing
- removed an unused directory in the GE folder
- removed obsolete config error handling
- Docs typo fixes
- Jupyter notebook improvements
great_expectations init
improvements- Simpler messaging in valiation notebooks
- replaced hacky loop with suite list call in notebooks
- CLI suite new now supports
--empty
flag that generates an empty suite and opens a notebook - add error handling to
init
flow for cases where user tries using a broken file
0.9.3
- Add support for transient table creation in snowflake (#1012)
- Improve path support in TupleStoreBackend for better cross-platform compatibility
- New features on
ExpecatationSuite
.add_citation()
get_citations()
SampleExpectationsDatasetProfiler
now leaves a citation containing the original batch kwargsgreat_expectations suite edit
now uses batch_kwargs from citations if they exist- Bugfix :: suite edit notebooks no longer blow away the existing suite while loading a batch of data
- More robust and tested logic in
suite edit
- DataDocs: bugfixes and improvements for smaller viewports
- Bugfix :: fix for bug that crashes SampleExpectationsDatasetProfiler if unexpected_percent is of type decimal.Decimal (
#1109 <https://github.com/great-expectations/great_expectations/issues/1109>
_)
0.9.2
0.9.2
This is small release with a bug fix and a new feature added to the CLI. To see the names of all suites in your project, run great_expectations suite list
or call . list_expectation_suite_names()
on your DataContext
.
Details
- Fixes #1095
- Added a
list_expectation_suites
function todata_context
, and a corresponding CLI function -suite list
. (Thanks @talagluck) - CI no longer enforces legacy python tests.
0.9.1
0.9.0
Version 0.9.0 is a major update to Great Expectations!
The DataContext has continued to evolve into a powerful tool for ensuring that Expectation Suites can properly represent the way users think about their data, and upgrading will make it much easier to store and share expectation suites, and to build data docs that support your whole team.
You’ll get awesome new features including improvements to data docs look and the ability to choose and store metrics for building flexible data quality dashboards.
The changes for version 0.9.0 fall into several broad areas:
Onboarding
Release 0.9.0 of Great Expectations makes it much easier to get started with the project. The init
flow has grown
to support a much wider array of use cases and to use more natural language rather than introducing
GreatExpectations concepts earlier. You can more easily configure different backends and datasources, take advantage of guided walkthroughs to find and profile data, and share project configurations with colleagues.
If you have already completed the init
flow using a previous version of Great Expectations, you do not need to
rerun the command. However, there are some small changes to your configuration that will be required. See
migrating versions for details.
CLI Command Improvements
With this release we have introduced a consistent naming pattern for accessing subcommands based on the noun (a Great Expectations object like suite
or docs
) and verb (an action like edit
or new
). The new user experience will allow us to more naturally organize access to CLI tools as new functionality is added.
Expectation Suite Naming and Namespace Changes
Defining shared expectation suites and validating data from different sources is much easier in this release. The
DataContext, which manages storage and configuration of expectations, validations, profiling, and data docs, no
longer requires that expectation suites live in a datasource-specific “namespace.” Instead, you should name suites
with the logical name corresponding to your data, making it easy to share them or validate against different data
sources. For example, the expectation suite "npi" for National Provider Identifier data can now be shared across
teams who access the same logical data in local systems using Pandas, on a distributed Spark cluster, or via a
relational database.
Batch Kwargs, or instructions for a datasource to build a batch of data, are similarly freed from a required
namespace, and you can more easily integrate Great Expectations into workflows where you do not need to use a
BatchKwargsGenerator (usually because you have a batch of data ready to validate, such as in a table or a known
directory).
The most noticeable impact of this API change is in the complete removal of the DataAssetIdentifier class. For
example, the create_expectation_suite
and get_batch
methods now no longer require a data_asset_name parameter, relying only on the expectation_suite_name and batch_kwargs to do their job. Similarly, there is no more asset name normalization required. See the upgrade guide for more information.
Metrics and Evaluation Parameter Stores
Metrics have received much more love in this release of Great Expectations! We've improved the system for declaring evaluation parameters that support dependencies between different expectation suites, so you can easily identify a particular field in the result of one expectation to use as the input into another. And the MetricsStore is now much more flexible, supporting a new ValidationAction that makes it possible to select metrics from a validation result to be saved in a database where they can power a dashboard.
Internal Type Changes and Improvements
Finally, in this release, we have done a lot of work under the hood to make things more robust, including updating
all of the internal objects to be more strongly typed. That change, while largely invisible to end users, paves the
way for some really exciting opportunities for extending Great Expectations as we build a bigger community around
the project.
We are really excited about this release, and encourage you to upgrade right away to take advantage of the more
flexible naming and simpler API for creating, accessing, and sharing your expectations. As always feel free to join
us on Slack for questions you don't see addressed!
0.8.8
- Add support for allow_relative_error to expect_column_quantile_values_to_be_between, allowing Redshift users access to this expectation
- Add support for checking backend type information for datetime columns using expect_column_min_to_be_between and expect_column_max_to_be_between
0.8.7
0.8.6
- Raise informative error if config variables are declared but unavailable
- Update ExpectationsStore defaults to be consistent across all FixedLengthTupleStoreBackend objects
- Add support for setting spark_options via SparkDFDatasource
- Include tail_weights by default when using build_continuous_partition_object
- Fix Redshift quantiles computation and type detection
- Allow boto3 options to be configured (#887)
0.8.5
- BREAKING CHANGE: move all reader options from the top-level batch_kwargs object to a sub-dictionary called
"reader_options" for SparkDFDatasource and PandasDatasource. This means it is no longer possible to specify
supplemental reader-specific options at the top-level ofget_batch
,yield_batch_kwargs
orbuild_batch_kwargs
calls, and instead, you must explicitly specify that they are reader_options, e.g. by a call such as:
context.yield_batch_kwargs(data_asset_name, reader_options={'encoding': 'utf-8'})
. - BREAKING CHANGE: move all query_params from the top-level batch_kwargs object to a sub-dictionary called
"query_params" for SqlAlchemyDatasource. This means it is no longer possible to specify supplemental query_params at
the top-level ofget_batch
,yield_batch_kwargs
orbuild_batch_kwargs
calls, and instead, you must explicitly specify that they are query_params, e.g. by a call such as:
context.yield_batch_kwargs(data_asset_name, query_params={'schema': 'foo'})
. - Add support for filtering validation result suites and validation result pages to show only failed expectations in
generated documentation - Add support for limit parameter to batch_kwargs for all datasources: Pandas, SqlAlchemy, and SparkDF; add support
to generators to support building batch_kwargs with limits specified. - Include raw_query and query_params in query_generator batch_kwargs
- Rename generator keyword arguments from data_asset_name to generator_asset to avoid ambiguity with normalized names
- Consistently migrate timestamp from batch_kwargs to batch_id
- Include batch_id in validation results
- Fix issue where batch_id was not included in some generated datasets
- Fix rendering issue with expect_table_columns_to_match_ordered_list expectation
- Add support for GCP, including BigQuery and GCS
- Add support to S3 generator for retrieving directories by specifying the
directory_assets
configuration - Fix warning regarding implicit class_name during init flow
- Expose build_generator API publicly on datasources
- Allow configuration of known extensions and return more informative message when SubdirReaderGenerator cannot find
relevant files. - Add support for allow_relative_error on internal dataset quantile functions, and add support for
build_continuous_partition_objec in Redshift - Fix truncated scroll bars in value_counts graphs