Documentation, error handling and code improvements for usability (#42)

* add to docs * more docs updates * add pycharm files to gitignore * error handling and openapi validation CSVConvert will exit if it encounters a fatal error from its inputs OpenAPI schema is validated using openapi-spec-validator * remove extra excepts * improve error reporting in validate_coverage * correct lines in test2moh and moh_template * tiny typo fix * improve error handling csvconvert * improve readability of README * format manifest info as table * add manifest link * doc additions and reorg * fix link * add links, fix typos * read template with csv reader Use proper csv reader to allow for quoted csvs to be read correctly * add docstrings and documentation * update templates * add pydoc * reverse float method change * mappings docstrings * add lazydocs * add functions index * change to first level heading * add automated docs note * add data_values dict info * switch to pdoc3 * minor changes * remove unused args and imports * revert method in test csv * updates based on PR review, thanks @daisieh
CanDIG · Nov 22, 2023 · 0420284 · 0420284
1 parent df88741
commit 0420284
Show file tree

Hide file tree

Showing 16 changed files with 846 additions and 197 deletions.
diff --git a/.gitignore b/.gitignore
@@ -5,3 +5,6 @@ __pycache__/*
 .DS_Store
 *.pyc
 .venv/
+_local
+.idea
+.~lock*
diff --git a/CSVConvert.py b/CSVConvert.py
diff --git a/README.md b/README.md
diff --git a/generate_mapping_docs.py b/generate_mapping_docs.py
@@ -0,0 +1,25 @@
+import subprocess
+
+
+def main():
+    docs = subprocess.check_output(["pdoc",  "mappings"])
+    print(docs.decode())
+    with open("mapping_functions.md", "r") as f:
+        mapping_functions_lines = f.readlines()
+
+    updated_mapping_functions = []
+    for line in mapping_functions_lines:
+        if line.startswith("# Standard Functions Index"):
+            break
+        else:
+            updated_mapping_functions.append(line)
+    updated_mapping_functions.append("# Standard Functions Index\n")
+    updated_mapping_functions.append(
+        "\n<!--- documentation below this line is generated automatically by running generate_mapping_docs.py --->\n\n")
+    updated_mapping_functions.append(docs.decode())
+    with open("mapping_functions.md", "w+") as f:
+        f.writelines(updated_mapping_functions)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/mapping_functions.md b/mapping_functions.md
@@ -39,7 +39,7 @@ DONOR.INDEX.primary_diagnoses.INDEX.submitter_primary_diagnosis_id, {single_val(
 DONOR.INDEX.primary_diagnoses.INDEX.date_of_diagnosis, {single_date(PRIMARY_DIAGNOSES_SHEET.date_of_diagnosis)}
 ```
 
-Here, `primary_diagnoses` will be added as an an array for the Donor with `submitter_donor_id`. Each entry in `primary_diagnoses` will use the values on the `PRIMARY_DIAGNOSES_SHEET` that have the same `submitter_donor_id`.
+Here, `primary_diagnoses` will be added as an array for the Donor with `submitter_donor_id`. Each entry in `primary_diagnoses` will use the values on the `PRIMARY_DIAGNOSES_SHEET` that have the same `submitter_donor_id`.
 
 If your schema doesn't contain any instances of a particular indexed field, you can specify `NONE`:
 `{indexed_on(NONE)}`
@@ -58,8 +58,11 @@ If your schema requires more complex mapping calculations, you can define an ind
 
 In addition to mapping column names, you can also transform the values inside the cells to make them align with the schema. We've already seen the simplest case - the `single_val` function takes a single value for the named field and returns it (and should only be used when you expect one single value).
 
-The standard functions are defined in `mappings.py`. They include functions for handling single values, list values, dates, and booleans.
+The standard functions are defined in `mappings.py`. They include functions for handling single values, list values, dates, and booleans. 
 
+Many functions take one or more `data_values` arguments as input. These are a dictionary representing how the CSVConvert script parses each cell of the input data. It is a dictionary of the format `{<field>:{<OBJECT_SHEET>: <value>}}`, e.g. `{'date_of_birth': {'Donor': '6 Jan 1954'}}`. 
+
+A detailed index of all standard functions can be viewed below in the [Standard functions index](#Standard-functions-index).
 
 ## Writing your own custom functions
 
@@ -118,3 +121,190 @@ represents the following JSON dict:
 }
 
 ```
+
+# Standard Functions Index
+
+<!--- documentation below this line is generated automatically by running generate_mapping_docs.py --->
+Module mappings
+===============
+
+Functions
+---------
+
+
+`boolean(data_values)`
+:   Convert value to boolean.
+
+    Args:
+        data_values: A string to be converted to a boolean
+
+    Returns:
+        A boolean based on the input,
+        `False` if value is in ["No", "no", "False", "false"]
+        `None` if value is in [`None`, "nan", "NaN", "NAN"]
+        `True` otherwise
+
+
+`concat_vals(data_values)`
+:   Concatenate several data values
+
+    Args:
+        data_values: a values dict with a list of values
+
+    Returns:
+        A concatenated string
+
+
+`date(data_values)`
+:   Format a list of dates to ISO standard YYYY-MM
+
+    Parses a list of strings representing dates into a list of strings with dates in ISO format YYYY-MM.
+
+    Args:
+        data_values: a value dict with a list of date-like strings
+
+    Returns:
+        a list of dates in YYYY-MM format or None if blank/empty/unparseable
+
+
+`flat_list_val(data_values)`
+:   Take a list mapping and break up any stringified lists into multiple values in the list.
+
+    Attempts to use ast.literal_eval() to parse the list, uses split(',') if this fails.
+
+    Args:
+        data_values: a values dict with a stringified list, e.g. "['a','b','c']"
+    Returns:
+        A parsed list of items in the list, e.g. ['a', 'b', 'c']
+
+
+`float(data_values)`
+:   Convert a value to a float.
+
+    Args:
+        data_values: A values dict
+
+    Returns:
+        A values dict with a string or integer converted to a float or None if null value
+
+    Raises:
+        ValueError by float() if it cannot convert to float.
+
+
+`has_value(data_values)`
+:   Returns a boolean based on whether the key in the mapping has a value.
+
+
+`index_val(data_values)`
+:   Take a mapping with possibly multiple values from multiple sheets and return an array.
+
+
+`indexed_on(data_values)`
+:   Default indexing value for arrays.
+
+    Args:
+        data_values: a values dict of identifiers to be indexed
+
+    Returns:
+        a dict of the format:
+        {"field": <identifier_field>,"sheet_name": <sheet_name>,"values": [<identifiers>]}
+
+
+`integer(data_values)`
+:   Convert a value to an integer.
+
+    Args:
+        data_values: a values dict with value to be converted to an int
+    Returns:
+        an integer version of the input value
+    Raises:
+        ValueError if int() cannot convert the input
+
+
+`list_val(data_values)`
+:   Takes a mapping with possibly multiple values from multiple sheets and returns an array of values.
+
+    Args:
+        data_values: a values dict with a list of values
+    Returns:
+        The list of values
+
+
+`moh_indexed_on_donor_if_others_absent(data_values)`
+:   Maps an object to a donor if not otherwise linked.
+
+    Specifically for the FollowUp object which can be linked to multiple objects.
+
+    Args:
+        **data_values: any number of values dicts with lists of identifiers, NOTE: values dict with donor identifiers
+        must be specified first.
+
+    Returns:
+        a dict of the format:
+
+            {'field': <field>, 'sheet': <sheet>, 'values': [<identifier or None>, <identifier or None>...]}
+
+        Where the 'values' list contains a donor identifier if it should be linked to that donor or None if already
+        linked to another object.
+
+
+`ontology_placeholder(data_values)`
+:   Placeholder function to make a fake ontology entry.
+
+    Should only be used for testing.
+
+    Args:
+        data_values: a values dict with a string value representing an ontology label
+
+    Returns:
+        a dict of the format:
+        {"id": "placeholder","label": data_values}
+
+
+`pipe_delim(data_values)`
+:   Takes a string and splits it into an array based on a pipe delimiter.
+
+    Args:
+         data_values: values dict with single pipe-delimited string, e.g. "a|b|c"
+
+    Returns:
+        a list of strings split by pipe, e.g. ["a","b","c"]
+
+
+`placeholder(data_values)`
+:   Return a dict with a placeholder key.
+
+
+`single_date(data_values)`
+:   Parses a single date to YYYY-MM format.
+
+    Args:
+        data_values: a value dict with a date
+
+    Returns:
+        a string of the format YYYY-MM, or None if blank/unparseable
+
+
+`single_val(data_values)`
+:   Parse a values dict and return the input as a single value.
+
+    Args:
+        data_values: a dict with values to be squashed
+
+    Returns:
+        A single value with any null values removed
+        None if list is empty or contains only 'nan', 'NaN', 'NAN'
+
+    Raises:
+        MappingError if multiple values found
+
+Classes
+-------
+
+`MappingError(value)`
+:   Common base class for all non-exit exceptions.
+
+    ### Ancestors (in MRO)
+
+    * builtins.Exception
+    * builtins.BaseException
-Original file line number
+Diff line change
@@ Expand Up / @@ -5,3 +5,6 @@ __pycache__/* @@
     .DS_Store
     *.pyc
     .venv/
+    _local
+    .idea
+    .~lock*