Merge pull request #245 from great-expectations/release

Create 0.4.0 release of great expectations
great-expectations · Mar 23, 2018 · 5793dba · 5793dba
2 parents e46913a + f91cade
commit 5793dba
Show file tree

Hide file tree

Showing 103 changed files with 9,610 additions and 4,359 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -12,8 +12,8 @@ env:
 install:
   - pip install --only-binary=numpy,scipy numpy scipy
   - if [ $PANDAS=latest ]; then pip install pandas; else pip install pandas==$PANDAS; fi
-  - pip install coverage coveralls
-  - pip install -r requirements.txt
+  - pip install -r requirements-dev.txt
 script:
-  - coverage run --source great_expectations -m unittest tests
+  - pytest --cov=great_expectations tests/
+after_success:
   - coveralls
diff --git a/CONTRIBUTING b/CONTRIBUTING
@@ -0,0 +1,55 @@
+
+## How to contribute
+We're excited for contributions to Great Expectations. If you see places where the code or documentation could be improved, please get involved!
+
+Submitting your changes
+Once your changes and tests are ready to submit for review:
+
+1. Test your changes
+
+    Run the test suite to make sure that nothing is broken. See the the section on testing below for help running tests. (Hint: `pytest` from the great_expectations root.)
+
+2. Sign the Contributor License Agreement
+
+    **When you contribute code, you affirm that the contribution is your original work and that you license the work to the project under the project’s open source license. Whether or not you state this explicitly, by submitting any copyrighted material via pull request, email, or other means you agree to license the material under the project’s open source license and warrant that you have the legal authority to do so.**
+
+    {Aspirational:} Please make sure you have signed our Contributor License Agreement. We are not asking you to assign copyright to us, but to give us the right to distribute your code without restriction. We ask this of all contributors in order to assure our users of the origin and continuing existence of the code. You only need to sign the CLA once.
+
+3. Rebase your changes
+
+    Update your local repository with the most recent code from the main Great Expectations repository, and rebase your branch on top of the latest `develop` branch. We prefer small, incremental commits, because it makes the thought process behind changes easier to review.
+
+4. Submit a pull request
+
+    Push your local changes to your forked copy of the repository and submit a pull request. In the pull request, choose a title which sums up the changes that you have made, and in the body provide more details about what your changes do. Also mention the number of the issue where discussion has taken place, eg "Closes #123".
+
+5. Participate in review
+
+    There will probably be discussion about the pull request. It's normal for a request to require some changes before merging it into the main Great Expectations project. We enjoy working with contributors to get their code accepted. There are many approaches to fixing a problem and it is important to find the best approach before writing too much code.
+
+## Testing
+Currently, (as of 3/9/2018) the tests are a bit of a mess. Consolidating them is an important next step. That means two things for contributors:
+
+First, don't worry about the mess. Write tests in whatever style suits your fancy. We (the core contributors) will worry about refactoring them later. As long as your thing works and is well-tested, you're good.
+
+(This is **not** an excuse to avoid writing tests. All contributions must be under test. We're just not dogmatic about the style of those tests today.)
+
+Second, if you have opinions on the testing framework, we'd love to hear them! Feedback based on your perspective and experience is very welcome.
+
+Most of the discussion to date is encapsulated here: https://github.com/great-expectations/great_expectations/issues/167. The `refactor_tests` branch is intended as a pilot implementation.
+
+## Conventions and Style
+
+* Avoid abbreviations (`column_idx` < `column_index`)
+* Use unambiguous expectation names, even if they're a bit longer. (`expect_columns_to_be` < `expect_columns_to_match_ordered_list`)
+
+Expectations aren't just tests---they're also a kind of data documentation. Because we want expectations to be easy to interpret, we're avoiding abbreviations almost everywhere. We're not entirely consistent about this yet, but there's pretty strong consensus among early team and users that we should be heading in that direction.
+
+These guidelines should be followed consistently for methods and variables exposed in the API. They aren't intended to be strict rules for every internal line of code in every function.
+
+* Expectation names should reflect their decorators.
+
+`expect_table_...` for methods decorated directly with `@expectation`
+`expect_column_values_...` for `@column_map_expectation`
+`expect_column_...` for `@column_aggregate_expectation`
+`expect_column_pair_values...` for `@column_pair_map_expectation`
diff --git a/Changelog.md b/Changelog.md
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1 +1,4 @@
 include *.txt
+include LICENSE
+graft tests
+global-exclude *.py[co]
diff --git a/README.md b/README.md
@@ -10,11 +10,10 @@ Great Expectations
 *Always know what to expect from your data.*
 
 
-
 What is great_expectations?
 --------------------------------------------------------------------------------
 
-Great Expectations is a python framework for bringing data pipelines and products under test.
+Great Expectations is a framework that helps teams save time and promote analytic integrity with a new twist on automated testing: pipeline tests. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time).
 
 Software developers have long known that automated testing is essential for managing complex codebases. Great Expectations brings the same discipline, confidence, and acceleration to data science and engineering teams.
 
@@ -31,5 +30,56 @@ To get more done with data, faster. Teams use great_expectations to
 * Simplify debugging data pipelines if (when) they break.
 * Codify assumptions used to build models when sharing with distributed teams or other analysts.
 
+How do I get started?
+--------------------------------------------------------------------------------
+
+It's easy! Just use pip install:
+
+
+    $ pip install great_expectations
+
+You can also clone the repository, which includes examples of using great_expectations.
+
+    $ git clone https://github.com/great-expectations/great_expectations.git
+    $ pip install great_expectations/
+
+What expectations are available?
+--------------------------------------------------------------------------------
+
+Expectations include:
+- `expect_table_row_count_to_equal`
+- `expect_column_values_to_be_unique`
+- `expect_column_values_to_be_in_set`
+- `expect_column_mean_to_be_between`
+- ...and many more
+
+Visit the [glossary of expectations](http://great-expectations.readthedocs.io/en/latest/glossary.html) for a complete list of expectations that are currently part of the great expectations vocabulary.
+
+Can I contribute?
+--------------------------------------------------------------------------------
+Absolutely. Yes, please. Start [here](https://github.com/great-expectations/great_expectations/blob/docs/contributor_docs/CONTRIBUTING), and don't be shy with questions!
+
+
+How do I learn more?
+--------------------------------------------------------------------------------
+
+For full documentation, visit [Great Expectations on readthedocs.io](http://great-expectations.readthedocs.io/en/latest/).
+
+[Down with Pipeline Debt!](https://medium.com/@expectgreatdata/down-with-pipeline-debt-introducing-great-expectations-862ddc46782a) explains the core philosophy behind Great Expectations. Please give it a read, and clap, follow, and share while you're at it.
+
+For quick, hands-on introductions to Great Expectations' key features, check out our walkthrough videos:
+
+* [Introduction to Great Expectations](https://www.youtube.com/watch?v=-_0tG7ACNU4)
+* [Using Distributional Expectations](https://www.youtube.com/watch?v=l3DYPVZAUmw&t=20s)
+
+
+What's the best way to get in touch with the Great Expectations team?
+--------------------------------------------------------------------------------
+
+[Issues on GitHub](https://github.com/great-expectations/great_expectations/issues). If you have questions, comments, feature requests, etc., [opening an issue](https://github.com/great-expectations/great_expectations/issues/new) is definitely the best path forward.
+
+
+Great Expectations doesn't do X. Is it right for my use case?
+--------------------------------------------------------------------------------
 
-Visit [the Great Expectations documentation](http://great-expectations.readthedocs.io/en/latest/) for more info.
+It depends. If you have needs that the library doesn't meet yet, please [upvote an existing issue(s)](https://github.com/great-expectations/great_expectations/issues) or [open a new issue](https://github.com/great-expectations/great_expectations/issues/new) and we'll see what we can do. Great Expectations is under active development, so your use case might be supported soon.
diff --git a/bin/great_expectations b/bin/great_expectations
@@ -18,22 +18,13 @@ def initialize():
 
 @argh.arg('data_set')
 @argh.arg('expectations_config_file')
-@argh.arg('--output_format', '-o', default="SUMMARY")
+@argh.arg('--result_format', '-o', default="SUMMARY")
 @argh.arg('--catch_exceptions', '-e', default=True)
-@argh.arg('--include_config', '-n', default=None)
 @argh.arg('--only_return_failures', '-f', default=False)
 @argh.arg('--custom_dataset_module', '-m', default=None)
 @argh.arg('--custom_dataset_class', '-c', default=None)
 def validate(data_set, expectations_config_file, **kwargs):
 
-	if kwargs["include_config"]:
-		if kwargs["include_config"] == "True":
-			kwargs["include_config"] = True
-		elif kwargs["include_config"] == "False":
-			kwargs["include_config"] = False
-		else:
-			raise ValueError("includ_config expects None, True, or False. Got "+kwargs["include_config"]+" instead.")
-
 	expectations_config = json.load(open(expectations_config_file))
 
 	if kwargs["custom_dataset_module"]:
@@ -43,14 +34,13 @@ def validate(data_set, expectations_config_file, **kwargs):
 		dataset_class = getattr(custom_module, kwargs["custom_dataset_class"])
 
 	else:
-		dataset_class = ge.dataset.PandasDataSet
+		dataset_class = ge.dataset.PandasDataset
 
 	df = ge.read_csv(data_set, expectations_config=expectations_config, dataset_class=dataset_class)
 
 	result = df.validate(
-		output_format=kwargs["output_format"],
+		result_format=kwargs["result_format"],
 		catch_exceptions=kwargs["catch_exceptions"],
-		include_config=kwargs["include_config"],
 		only_return_failures=kwargs["only_return_failures"],
 	)
 
@@ -59,4 +49,4 @@ def validate(data_set, expectations_config_file, **kwargs):
 argh.dispatch_commands([
 	initialize,
 	validate,
-])
+])
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -52,7 +52,7 @@
 
 # General information about the project.
 project = u'great_expectations'
-copyright = u'2017, The Great Expectations Team'
+copyright = u'2018, The Great Expectations Team'
 author = u'The Great Expectations Team'
 
 # The version info for the project you're documenting, acts as replacement for
@@ -100,7 +100,7 @@
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = ['_static']
+#html_static_path = ['_static']
 
 
 # -- Options for Napoleon Extension --------------------------------------------

diff --git a/docs/source/conventions.rst b/docs/source/conventions.rst
@@ -11,6 +11,6 @@ Naming conventions
 Extending Great Expectations
 ================================================================================
 
-When implementing an expectation defined in the base DataSet for a new backend, add the `@DocInherit` decorator first to use the default DataSet documentation for the expectation. That can help users of your DataSet see consistent documentation no matter which backend is implementing the great_expectations API.
+When implementing an expectation defined in the base `Dataset` for a new backend, add the `@DocInherit` decorator first to use the default dataset documentation for the expectation. That can help users of your dataset see consistent documentation no matter which backend is implementing the great_expectations API.
 
-`@DocInherit` overrides your function's __get__ method with one that will replace the local docstring with the docstring from its parent. It is defined in `dataset.util`.
+`@DocInherit` overrides your function's __get__ method with one that will replace the local docstring with the docstring from its parent. It is defined in `Dataset.util`.