Skip to content

Commit

Permalink
Documentation, error handling and code improvements for usability (#42)
Browse files Browse the repository at this point in the history
* add to docs

* more docs updates

* add pycharm files to gitignore

* error handling and openapi validation

CSVConvert will exit if it encounters a fatal error from its inputs

OpenAPI schema is validated using openapi-spec-validator

* remove extra excepts

* improve error reporting in validate_coverage

* correct lines in test2moh and moh_template

* tiny typo fix

* improve error handling csvconvert

* improve readability of README

* format manifest info as table

* add manifest link

* doc additions and reorg

* fix link

* add links, fix typos

* read template with csv reader

Use proper csv reader to allow for quoted csvs to be read correctly

* add docstrings and documentation

* update templates

* add pydoc

* reverse float method change

* mappings docstrings

* add lazydocs

* add functions index

* change to first level heading

* add automated docs note

* add data_values dict info

* switch to pdoc3

* minor changes

* remove unused args and imports

* revert method in test csv

* updates based on PR review, thanks @daisieh
  • Loading branch information
mshadbolt authored Nov 22, 2023
1 parent df88741 commit 0420284
Show file tree
Hide file tree
Showing 16 changed files with 846 additions and 197 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,6 @@ __pycache__/*
.DS_Store
*.pyc
.venv/
_local
.idea
.~lock*
151 changes: 93 additions & 58 deletions CSVConvert.py

Large diffs are not rendered by default.

208 changes: 132 additions & 76 deletions README.md

Large diffs are not rendered by default.

25 changes: 25 additions & 0 deletions generate_mapping_docs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import subprocess


def main():
docs = subprocess.check_output(["pdoc", "mappings"])
print(docs.decode())
with open("mapping_functions.md", "r") as f:
mapping_functions_lines = f.readlines()

updated_mapping_functions = []
for line in mapping_functions_lines:
if line.startswith("# Standard Functions Index"):
break
else:
updated_mapping_functions.append(line)
updated_mapping_functions.append("# Standard Functions Index\n")
updated_mapping_functions.append(
"\n<!--- documentation below this line is generated automatically by running generate_mapping_docs.py --->\n\n")
updated_mapping_functions.append(docs.decode())
with open("mapping_functions.md", "w+") as f:
f.writelines(updated_mapping_functions)


if __name__ == '__main__':
main()
194 changes: 192 additions & 2 deletions mapping_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ DONOR.INDEX.primary_diagnoses.INDEX.submitter_primary_diagnosis_id, {single_val(
DONOR.INDEX.primary_diagnoses.INDEX.date_of_diagnosis, {single_date(PRIMARY_DIAGNOSES_SHEET.date_of_diagnosis)}
```

Here, `primary_diagnoses` will be added as an an array for the Donor with `submitter_donor_id`. Each entry in `primary_diagnoses` will use the values on the `PRIMARY_DIAGNOSES_SHEET` that have the same `submitter_donor_id`.
Here, `primary_diagnoses` will be added as an array for the Donor with `submitter_donor_id`. Each entry in `primary_diagnoses` will use the values on the `PRIMARY_DIAGNOSES_SHEET` that have the same `submitter_donor_id`.

If your schema doesn't contain any instances of a particular indexed field, you can specify `NONE`:
`{indexed_on(NONE)}`
Expand All @@ -58,8 +58,11 @@ If your schema requires more complex mapping calculations, you can define an ind

In addition to mapping column names, you can also transform the values inside the cells to make them align with the schema. We've already seen the simplest case - the `single_val` function takes a single value for the named field and returns it (and should only be used when you expect one single value).

The standard functions are defined in `mappings.py`. They include functions for handling single values, list values, dates, and booleans.
The standard functions are defined in `mappings.py`. They include functions for handling single values, list values, dates, and booleans.

Many functions take one or more `data_values` arguments as input. These are a dictionary representing how the CSVConvert script parses each cell of the input data. It is a dictionary of the format `{<field>:{<OBJECT_SHEET>: <value>}}`, e.g. `{'date_of_birth': {'Donor': '6 Jan 1954'}}`.

A detailed index of all standard functions can be viewed below in the [Standard functions index](#Standard-functions-index).

## Writing your own custom functions

Expand Down Expand Up @@ -118,3 +121,190 @@ represents the following JSON dict:
}
```

# Standard Functions Index

<!--- documentation below this line is generated automatically by running generate_mapping_docs.py --->
Module mappings
===============

Functions
---------


`boolean(data_values)`
: Convert value to boolean.

Args:
data_values: A string to be converted to a boolean

Returns:
A boolean based on the input,
`False` if value is in ["No", "no", "False", "false"]
`None` if value is in [`None`, "nan", "NaN", "NAN"]
`True` otherwise


`concat_vals(data_values)`
: Concatenate several data values

Args:
data_values: a values dict with a list of values

Returns:
A concatenated string


`date(data_values)`
: Format a list of dates to ISO standard YYYY-MM

Parses a list of strings representing dates into a list of strings with dates in ISO format YYYY-MM.

Args:
data_values: a value dict with a list of date-like strings

Returns:
a list of dates in YYYY-MM format or None if blank/empty/unparseable


`flat_list_val(data_values)`
: Take a list mapping and break up any stringified lists into multiple values in the list.

Attempts to use ast.literal_eval() to parse the list, uses split(',') if this fails.

Args:
data_values: a values dict with a stringified list, e.g. "['a','b','c']"
Returns:
A parsed list of items in the list, e.g. ['a', 'b', 'c']


`float(data_values)`
: Convert a value to a float.

Args:
data_values: A values dict

Returns:
A values dict with a string or integer converted to a float or None if null value

Raises:
ValueError by float() if it cannot convert to float.


`has_value(data_values)`
: Returns a boolean based on whether the key in the mapping has a value.


`index_val(data_values)`
: Take a mapping with possibly multiple values from multiple sheets and return an array.


`indexed_on(data_values)`
: Default indexing value for arrays.

Args:
data_values: a values dict of identifiers to be indexed

Returns:
a dict of the format:
{"field": <identifier_field>,"sheet_name": <sheet_name>,"values": [<identifiers>]}


`integer(data_values)`
: Convert a value to an integer.

Args:
data_values: a values dict with value to be converted to an int
Returns:
an integer version of the input value
Raises:
ValueError if int() cannot convert the input


`list_val(data_values)`
: Takes a mapping with possibly multiple values from multiple sheets and returns an array of values.

Args:
data_values: a values dict with a list of values
Returns:
The list of values


`moh_indexed_on_donor_if_others_absent(data_values)`
: Maps an object to a donor if not otherwise linked.

Specifically for the FollowUp object which can be linked to multiple objects.

Args:
**data_values: any number of values dicts with lists of identifiers, NOTE: values dict with donor identifiers
must be specified first.

Returns:
a dict of the format:

{'field': <field>, 'sheet': <sheet>, 'values': [<identifier or None>, <identifier or None>...]}

Where the 'values' list contains a donor identifier if it should be linked to that donor or None if already
linked to another object.


`ontology_placeholder(data_values)`
: Placeholder function to make a fake ontology entry.

Should only be used for testing.

Args:
data_values: a values dict with a string value representing an ontology label

Returns:
a dict of the format:
{"id": "placeholder","label": data_values}


`pipe_delim(data_values)`
: Takes a string and splits it into an array based on a pipe delimiter.

Args:
data_values: values dict with single pipe-delimited string, e.g. "a|b|c"

Returns:
a list of strings split by pipe, e.g. ["a","b","c"]


`placeholder(data_values)`
: Return a dict with a placeholder key.


`single_date(data_values)`
: Parses a single date to YYYY-MM format.

Args:
data_values: a value dict with a date

Returns:
a string of the format YYYY-MM, or None if blank/unparseable


`single_val(data_values)`
: Parse a values dict and return the input as a single value.

Args:
data_values: a dict with values to be squashed

Returns:
A single value with any null values removed
None if list is empty or contains only 'nan', 'NaN', 'NAN'

Raises:
MappingError if multiple values found

Classes
-------

`MappingError(value)`
: Common base class for all non-exit exceptions.

### Ancestors (in MRO)

* builtins.Exception
* builtins.BaseException
Loading

0 comments on commit 0420284

Please sign in to comment.