Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DIG-1359: Documentation, error handling and code improvements for usability #42

Merged
merged 33 commits into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
3132a0a
add to docs
mshadbolt Nov 1, 2023
9c652e1
more docs updates
mshadbolt Nov 1, 2023
c422aad
add pycharm files to gitignore
mshadbolt Nov 1, 2023
debe514
error handling and openapi validation
mshadbolt Nov 1, 2023
32b71ba
remove extra excepts
mshadbolt Nov 1, 2023
88be316
improve error reporting in validate_coverage
mshadbolt Nov 8, 2023
579d4aa
correct lines in test2moh and moh_template
mshadbolt Nov 8, 2023
9c6ec7a
tiny typo fix
mshadbolt Nov 8, 2023
a3c968b
improve error handling csvconvert
mshadbolt Nov 8, 2023
2e24452
improve readability of README
mshadbolt Nov 8, 2023
a1f4bee
format manifest info as table
mshadbolt Nov 8, 2023
0351462
add manifest link
mshadbolt Nov 8, 2023
be35d12
doc additions and reorg
mshadbolt Nov 9, 2023
1d4d97e
fix link
mshadbolt Nov 9, 2023
89410ec
add links, fix typos
mshadbolt Nov 9, 2023
8c8f05a
read template with csv reader
mshadbolt Nov 14, 2023
30899b7
add docstrings and documentation
mshadbolt Nov 14, 2023
e142f4d
update templates
mshadbolt Nov 14, 2023
1606897
add pydoc
mshadbolt Nov 14, 2023
c40ee83
reverse float method change
mshadbolt Nov 14, 2023
ad4984b
mappings docstrings
mshadbolt Nov 14, 2023
eb539e4
add lazydocs
mshadbolt Nov 14, 2023
7b3535d
add functions index
mshadbolt Nov 14, 2023
9ebf9bc
change to first level heading
mshadbolt Nov 14, 2023
e04aa8c
add automated docs note
mshadbolt Nov 14, 2023
786f25b
add data_values dict info
mshadbolt Nov 14, 2023
5dcd029
switch to pdoc3
mshadbolt Nov 17, 2023
e516bb6
minor changes
mshadbolt Nov 21, 2023
56c2728
remove unused args and imports
mshadbolt Nov 21, 2023
2eba9b4
revert method in test csv
mshadbolt Nov 21, 2023
a8aefad
Merge branch 'main' into mshadbolt/docs-and-error-handling
mshadbolt Nov 21, 2023
70f3dc1
updates based on PR review, thanks @daisieh
mshadbolt Nov 22, 2023
05758f9
Merge branch 'mshadbolt/docs-and-error-handling' of https://github.co…
mshadbolt Nov 22, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,6 @@ __pycache__/*
.DS_Store
*.pyc
.venv/
_local
.idea
.~lock*
151 changes: 93 additions & 58 deletions CSVConvert.py

Large diffs are not rendered by default.

208 changes: 132 additions & 76 deletions README.md

Large diffs are not rendered by default.

25 changes: 25 additions & 0 deletions generate_mapping_docs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import subprocess


def main():
docs = subprocess.check_output(["pdoc", "mappings"])
print(docs.decode())
with open("mapping_functions.md", "r") as f:
mapping_functions_lines = f.readlines()

updated_mapping_functions = []
for line in mapping_functions_lines:
if line.startswith("# Standard Functions Index"):
break
else:
updated_mapping_functions.append(line)
updated_mapping_functions.append("# Standard Functions Index\n")
updated_mapping_functions.append(
"\n<!--- documentation below this line is generated automatically by running generate_mapping_docs.py --->\n\n")
updated_mapping_functions.append(docs.decode())
with open("mapping_functions.md", "w+") as f:
f.writelines(updated_mapping_functions)


if __name__ == '__main__':
main()
194 changes: 192 additions & 2 deletions mapping_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ DONOR.INDEX.primary_diagnoses.INDEX.submitter_primary_diagnosis_id, {single_val(
DONOR.INDEX.primary_diagnoses.INDEX.date_of_diagnosis, {single_date(PRIMARY_DIAGNOSES_SHEET.date_of_diagnosis)}
```

Here, `primary_diagnoses` will be added as an an array for the Donor with `submitter_donor_id`. Each entry in `primary_diagnoses` will use the values on the `PRIMARY_DIAGNOSES_SHEET` that have the same `submitter_donor_id`.
Here, `primary_diagnoses` will be added as an array for the Donor with `submitter_donor_id`. Each entry in `primary_diagnoses` will use the values on the `PRIMARY_DIAGNOSES_SHEET` that have the same `submitter_donor_id`.

If your schema doesn't contain any instances of a particular indexed field, you can specify `NONE`:
`{indexed_on(NONE)}`
Expand All @@ -58,8 +58,11 @@ If your schema requires more complex mapping calculations, you can define an ind

In addition to mapping column names, you can also transform the values inside the cells to make them align with the schema. We've already seen the simplest case - the `single_val` function takes a single value for the named field and returns it (and should only be used when you expect one single value).

The standard functions are defined in `mappings.py`. They include functions for handling single values, list values, dates, and booleans.
The standard functions are defined in `mappings.py`. They include functions for handling single values, list values, dates, and booleans.

Many functions take one or more `data_values` arguments as input. These are a dictionary representing how the CSVConvert script parses each cell of the input data. It is a dictionary of the format `{<field>:{<OBJECT_SHEET>: <value>}}`, e.g. `{'date_of_birth': {'Donor': '6 Jan 1954'}}`.

A detailed index of all standard functions can be viewed below in the [Standard functions index](#Standard-functions-index).

## Writing your own custom functions

Expand Down Expand Up @@ -118,3 +121,190 @@ represents the following JSON dict:
}

```

# Standard Functions Index

<!--- documentation below this line is generated automatically by running generate_mapping_docs.py --->
Module mappings
===============

Functions
---------


`boolean(data_values)`
: Convert value to boolean.

Args:
data_values: A string to be converted to a boolean

Returns:
A boolean based on the input,
`False` if value is in ["No", "no", "False", "false"]
`None` if value is in [`None`, "nan", "NaN", "NAN"]
`True` otherwise


`concat_vals(data_values)`
: Concatenate several data values

Args:
data_values: a values dict with a list of values

Returns:
A concatenated string


`date(data_values)`
: Format a list of dates to ISO standard YYYY-MM

Parses a list of strings representing dates into a list of strings with dates in ISO format YYYY-MM.

Args:
data_values: a value dict with a list of date-like strings

Returns:
a list of dates in YYYY-MM format or None if blank/empty/unparseable


`flat_list_val(data_values)`
: Take a list mapping and break up any stringified lists into multiple values in the list.

Attempts to use ast.literal_eval() to parse the list, uses split(',') if this fails.

Args:
data_values: a values dict with a stringified list, e.g. "['a','b','c']"
Returns:
A parsed list of items in the list, e.g. ['a', 'b', 'c']


`float(data_values)`
: Convert a value to a float.

Args:
data_values: A values dict

Returns:
A values dict with a string or integer converted to a float or None if null value

Raises:
ValueError by float() if it cannot convert to float.


`has_value(data_values)`
: Returns a boolean based on whether the key in the mapping has a value.


`index_val(data_values)`
: Take a mapping with possibly multiple values from multiple sheets and return an array.


`indexed_on(data_values)`
: Default indexing value for arrays.

Args:
data_values: a values dict of identifiers to be indexed

Returns:
a dict of the format:
{"field": <identifier_field>,"sheet_name": <sheet_name>,"values": [<identifiers>]}


`integer(data_values)`
: Convert a value to an integer.

Args:
data_values: a values dict with value to be converted to an int
Returns:
an integer version of the input value
Raises:
ValueError if int() cannot convert the input


`list_val(data_values)`
: Takes a mapping with possibly multiple values from multiple sheets and returns an array of values.

Args:
data_values: a values dict with a list of values
Returns:
The list of values


`moh_indexed_on_donor_if_others_absent(data_values)`
: Maps an object to a donor if not otherwise linked.

Specifically for the FollowUp object which can be linked to multiple objects.

Args:
**data_values: any number of values dicts with lists of identifiers, NOTE: values dict with donor identifiers
must be specified first.

Returns:
a dict of the format:

{'field': <field>, 'sheet': <sheet>, 'values': [<identifier or None>, <identifier or None>...]}

Where the 'values' list contains a donor identifier if it should be linked to that donor or None if already
linked to another object.


`ontology_placeholder(data_values)`
: Placeholder function to make a fake ontology entry.

Should only be used for testing.

Args:
data_values: a values dict with a string value representing an ontology label

Returns:
a dict of the format:
{"id": "placeholder","label": data_values}


`pipe_delim(data_values)`
: Takes a string and splits it into an array based on a pipe delimiter.

Args:
data_values: values dict with single pipe-delimited string, e.g. "a|b|c"

Returns:
a list of strings split by pipe, e.g. ["a","b","c"]


`placeholder(data_values)`
: Return a dict with a placeholder key.


`single_date(data_values)`
: Parses a single date to YYYY-MM format.

Args:
data_values: a value dict with a date

Returns:
a string of the format YYYY-MM, or None if blank/unparseable


`single_val(data_values)`
: Parse a values dict and return the input as a single value.

Args:
data_values: a dict with values to be squashed

Returns:
A single value with any null values removed
None if list is empty or contains only 'nan', 'NaN', 'NAN'

Raises:
MappingError if multiple values found

Classes
-------

`MappingError(value)`
: Common base class for all non-exit exceptions.

### Ancestors (in MRO)

* builtins.Exception
* builtins.BaseException
Loading
Loading