Skip to content

Commit

Permalink
DIG-1150: Template update and documentation for DateIntervals (#53)
Browse files Browse the repository at this point in the history
* update template script

* add build to gitignore

* update templates to intervals

* update moh_diffs

* update mapping functions doc
  • Loading branch information
mshadbolt authored Feb 29, 2024
1 parent a9ba7c7 commit 815cbfc
Show file tree
Hide file tree
Showing 6 changed files with 277 additions and 192 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@ __pycache__/*
.venv/
_local
.idea
.~lock*
.~lock*
build/
125 changes: 70 additions & 55 deletions mapping_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,20 @@ Many functions take one or more `data_values` arguments as input. These are a di

A detailed index of all standard functions can be viewed below in the [Standard functions index](#Standard-functions-index).

### Dealing with Dates

As of version 2.1 of the [MoHCCN Data Model](https://www.marathonofhopecancercentres.ca/docs/default-source/policies-and-guidelines/clinical-data-model-v2.1/mohccn-clinical-data-model-release-notes_sep2023.pdf?Status=Master&sfvrsn=19ece028_3), dates need to be converted into date intervals relative to the earliest date of diagnosis. Support for this has been incorporated into clinical_ETL_code v.2.0.0. In order to convert dates to date intervals, a `reference_date` must be provided in the `manifest.yml`. This can be an absolute date, or a function to calculate a date based on the input date, e.g. `earliest_date(Donor.date_resolution, PrimaryDiagnosis.date_of_diagnosis)`. In the mapping csv, the in-built `date_interval()` mapping function can be used to calculate the appropriate date interval information for any date-type field. e.g.:

```commandline
DONOR.INDEX.date_of_birth, {date_interval(Donor.date_of_birth)}
```

If input data has pre-calculated date intervals as integers, the `int_to_date_interval_json()` function can be used to transform the integer into the required DateInterval json object. e.g.:

```commandline
DONOR.INDEX.date_of_death, {int_to_date_interval_json(Donor.date_of_death)}
```

## Writing your own custom functions

If the data cannot be transformed with one of the standard functions, you can define your own. In your data directory (the one that contains `manifest.yml`) create a python file (let's assume you called it `new_cohort.py`) and add the name of that file as the `functions` entry in the manifest (without the .py extension).
Expand Down Expand Up @@ -140,202 +154,203 @@ Module mappings
Functions
---------


`boolean(data_values)`
: Convert value to boolean.

Args:
data_values: A string to be converted to a boolean

Returns:
A boolean based on the input,
`False` if value is in ["No", "no", "N", "n", "False", "false", "F", "f"]
`True` if value is in ["Yes", "yes", "Y", "y", True", "true", "T", "t"]
None if value is in [`None`, "nan", "NaN", "NAN"]
None otherwise


`concat_vals(data_values)`
: Concatenate several data values

Args:
data_values: a values dict with a list of values

Returns:
A concatenated string


`date(data_values)`
: Format a list of dates to ISO standard YYYY-MM

Parses a list of strings representing dates into a list of strings with dates in ISO format YYYY-MM.

Args:
data_values: a value dict with a list of date-like strings

Returns:
a list of dates in YYYY-MM format or None if blank/empty/unparseable


`date_interval(data_values)`
: Calculates a date interval from a given date relative to the reference date specified in the manifest.

Args:
data_values: a values dict with a date

Returns:
A dictionary with calculated month_interval and optionally a day_interval depending on the specified
date_resolution.


`earliest_date(data_values)`
: Calculates the earliest date from a set of dates

Args:
data_values: A values dict of dates of diagnosis and date_resolution

Returns:
A dictionary containing the earliest date (`offset`) as a date object and the provided `date_resolution`


`flat_list_val(data_values)`
: Take a list mapping and break up any stringified lists into multiple values in the list.

Attempts to use ast.literal_eval() to parse the list, uses split(',') if this fails.

Args:
data_values: a values dict with a stringified list, e.g. "['a','b','c']"
Returns:
A parsed list of items in the list, e.g. ['a', 'b', 'c']


`floating(data_values)`
: Convert a value to a float.

: Convert a value to a float.

Args:
data_values: A values dict

Returns:
A values dict with a string or integer converted to a float or None if null value

Raises:
ValueError by float() if it cannot convert to float.


`has_value(data_values)`
: Returns a boolean based on whether the key in the mapping has a value.


`index_val(data_values)`
: Take a mapping with possibly multiple values from multiple sheets and return an array.


`indexed_on(data_values)`
: Default indexing value for arrays.

Args:
data_values: a values dict of identifiers to be indexed

Returns:
a dict of the format:
{"field": <identifier_field>,"sheet_name": <sheet_name>,"values": [<identifiers>]}


`int_to_date_interval_json(data_values)`
: Converts an integer date interval into JSON format.

Args:
data_values: a values dict with an integer.

Returns:
A dictionary with a calculated month_interval and optionally a day_interval depending on the specified date_resolution in the donor file.


`integer(data_values)`
: Convert a value to an integer.

Args:
data_values: a values dict with value to be converted to an int
Returns:
an integer version of the input value
Raises:
ValueError if int() cannot convert the input


`list_val(data_values)`
: Takes a mapping with possibly multiple values from multiple sheets and returns an array of values.

Args:
data_values: a values dict with a list of values
Returns:
The list of values


`moh_indexed_on_donor_if_others_absent(data_values)`
: Maps an object to a donor if not otherwise linked.

Specifically for the FollowUp object which can be linked to multiple objects.

Args:
**data_values: any number of values dicts with lists of identifiers, NOTE: values dict with donor identifiers
must be specified first.

Returns:
a dict of the format:

{'field': <field>, 'sheet': <sheet>, 'values': [<identifier or None>, <identifier or None>...]}

Where the 'values' list contains a donor identifier if it should be linked to that donor or None if already
linked to another object.


`ontology_placeholder(data_values)`
: Placeholder function to make a fake ontology entry.

Should only be used for testing.

Args:
data_values: a values dict with a string value representing an ontology label

Returns:
a dict of the format:
{"id": "placeholder","label": data_values}


`pipe_delim(data_values)`
: Takes a string and splits it into an array based on a pipe delimiter.

Args:
data_values: values dict with single pipe-delimited string, e.g. "a|b|c"

Returns:
a list of strings split by pipe, e.g. ["a","b","c"]


`placeholder(data_values)`
: Return a dict with a placeholder key.


`single_date(data_values)`
: Parses a single date to YYYY-MM format.

Args:
data_values: a value dict with a date

Returns:
a string of the format YYYY-MM, or None if blank/unparseable


`single_val(data_values)`
: Parse a values dict and return the input as a single value.

Args:
data_values: a dict with values to be squashed

Returns:
A single value with any null values removed
None if list is empty or contains only 'nan', 'NaN', 'NAN'

Raises:
MappingError if multiple values found

Expand Down
28 changes: 14 additions & 14 deletions moh_template.csv
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
DONOR.INDEX, {indexed_on(DONOR_SHEET.submitter_donor_id)}
DONOR.INDEX.cause_of_death, {single_val(DONOR_SHEET.cause_of_death)}
DONOR.INDEX.date_resolution, {single_val(DONOR_SHEET.date_resolution)}
DONOR.INDEX.date_alive_after_lost_to_followup, {single_date(DONOR_SHEET.date_alive_after_lost_to_followup)}
DONOR.INDEX.date_of_birth, {single_date(DONOR_SHEET.date_of_birth)}
DONOR.INDEX.date_of_death, {single_date(DONOR_SHEET.date_of_death)}
DONOR.INDEX.date_alive_after_lost_to_followup, {date_interval(DONOR_SHEET.date_alive_after_lost_to_followup)}
DONOR.INDEX.date_of_birth, {date_interval(DONOR_SHEET.date_of_birth)}
DONOR.INDEX.date_of_death, {date_interval(DONOR_SHEET.date_of_death)}
DONOR.INDEX.gender, {single_val(DONOR_SHEET.gender)}
DONOR.INDEX.is_deceased, {boolean(DONOR_SHEET.is_deceased)}
DONOR.INDEX.lost_to_followup_after_clinical_event_identifier, {single_val(DONOR_SHEET.lost_to_followup_after_clinical_event_identifier)}
Expand All @@ -24,7 +24,7 @@ DONOR.INDEX.primary_diagnoses.INDEX.clinical_n_category, {single_val(PRIMARY_DIA
DONOR.INDEX.primary_diagnoses.INDEX.clinical_stage_group, {single_val(PRIMARY_DIAGNOSES_SHEET.clinical_stage_group)}
DONOR.INDEX.primary_diagnoses.INDEX.clinical_t_category, {single_val(PRIMARY_DIAGNOSES_SHEET.clinical_t_category)}
DONOR.INDEX.primary_diagnoses.INDEX.clinical_tumour_staging_system, {single_val(PRIMARY_DIAGNOSES_SHEET.clinical_tumour_staging_system)}
DONOR.INDEX.primary_diagnoses.INDEX.date_of_diagnosis, {single_date(PRIMARY_DIAGNOSES_SHEET.date_of_diagnosis)}
DONOR.INDEX.primary_diagnoses.INDEX.date_of_diagnosis, {date_interval(PRIMARY_DIAGNOSES_SHEET.date_of_diagnosis)}
DONOR.INDEX.primary_diagnoses.INDEX.laterality, {single_val(PRIMARY_DIAGNOSES_SHEET.laterality)}
DONOR.INDEX.primary_diagnoses.INDEX.lymph_nodes_examined_method, {single_val(PRIMARY_DIAGNOSES_SHEET.lymph_nodes_examined_method)}
DONOR.INDEX.primary_diagnoses.INDEX.lymph_nodes_examined_status, {single_val(PRIMARY_DIAGNOSES_SHEET.lymph_nodes_examined_status)}
Expand All @@ -41,7 +41,7 @@ DONOR.INDEX.primary_diagnoses.INDEX.specimens.INDEX.percent_tumour_cells_range,
DONOR.INDEX.primary_diagnoses.INDEX.specimens.INDEX.reference_pathology_confirmed_diagnosis, {single_val(SPECIMENS_SHEET.reference_pathology_confirmed_diagnosis)}
DONOR.INDEX.primary_diagnoses.INDEX.specimens.INDEX.reference_pathology_confirmed_tumour_presence, {single_val(SPECIMENS_SHEET.reference_pathology_confirmed_tumour_presence)}
DONOR.INDEX.primary_diagnoses.INDEX.specimens.INDEX.specimen_anatomic_location, {single_val(SPECIMENS_SHEET.specimen_anatomic_location)}
DONOR.INDEX.primary_diagnoses.INDEX.specimens.INDEX.specimen_collection_date, {single_date(SPECIMENS_SHEET.specimen_collection_date)}
DONOR.INDEX.primary_diagnoses.INDEX.specimens.INDEX.specimen_collection_date, {date_interval(SPECIMENS_SHEET.specimen_collection_date)}
DONOR.INDEX.primary_diagnoses.INDEX.specimens.INDEX.specimen_laterality, {single_val(SPECIMENS_SHEET.specimen_laterality)}
DONOR.INDEX.primary_diagnoses.INDEX.specimens.INDEX.specimen_processing, {single_val(SPECIMENS_SHEET.specimen_processing)}
DONOR.INDEX.primary_diagnoses.INDEX.specimens.INDEX.specimen_storage, {single_val(SPECIMENS_SHEET.specimen_storage)}
Expand All @@ -64,10 +64,10 @@ DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.response_to_treatment, {sin
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.response_to_treatment_criteria_method, {single_val(TREATMENTS_SHEET.response_to_treatment_criteria_method)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.status_of_treatment, {single_val(TREATMENTS_SHEET.status_of_treatment)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.submitter_treatment_id, {single_val(TREATMENTS_SHEET.submitter_treatment_id)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.treatment_end_date, {single_date(TREATMENTS_SHEET.treatment_end_date)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.treatment_end_date, {date_interval(TREATMENTS_SHEET.treatment_end_date)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.treatment_intent, {single_val(TREATMENTS_SHEET.treatment_intent)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.treatment_setting, {single_val(TREATMENTS_SHEET.treatment_setting)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.treatment_start_date, {single_date(TREATMENTS_SHEET.treatment_start_date)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.treatment_start_date, {date_interval(TREATMENTS_SHEET.treatment_start_date)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.treatment_type, {pipe_delim(TREATMENTS_SHEET.treatment_type)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.chemotherapies.INDEX, {indexed_on(CHEMOTHERAPIES_SHEET.submitter_treatment_id)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.chemotherapies.INDEX.actual_cumulative_drug_dose, {integer(CHEMOTHERAPIES_SHEET.actual_cumulative_drug_dose)}
Expand Down Expand Up @@ -116,8 +116,8 @@ DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.surgeries.INDEX.tumour_leng
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.surgeries.INDEX.tumour_width, {integer(SURGERIES_SHEET.tumour_width)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.followups.INDEX, {indexed_on(FOLLOWUPS_SHEET.submitter_treatment_id)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.followups.INDEX.anatomic_site_progression_or_recurrence, {pipe_delim(FOLLOWUPS_SHEET.anatomic_site_progression_or_recurrence)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.followups.INDEX.date_of_followup, {single_date(FOLLOWUPS_SHEET.date_of_followup)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.followups.INDEX.date_of_relapse, {single_date(FOLLOWUPS_SHEET.date_of_relapse)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.followups.INDEX.date_of_followup, {date_interval(FOLLOWUPS_SHEET.date_of_followup)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.followups.INDEX.date_of_relapse, {date_interval(FOLLOWUPS_SHEET.date_of_relapse)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.followups.INDEX.disease_status_at_followup, {single_val(FOLLOWUPS_SHEET.disease_status_at_followup)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.followups.INDEX.method_of_progression_status, {pipe_delim(FOLLOWUPS_SHEET.method_of_progression_status)}
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.followups.INDEX.recurrence_m_category, {single_val(FOLLOWUPS_SHEET.recurrence_m_category)}
Expand All @@ -129,8 +129,8 @@ DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.followups.INDEX.relapse_typ
DONOR.INDEX.primary_diagnoses.INDEX.treatments.INDEX.followups.INDEX.submitter_follow_up_id, {single_val(FOLLOWUPS_SHEET.submitter_follow_up_id)}
DONOR.INDEX.primary_diagnoses.INDEX.followups.INDEX, {indexed_on(FOLLOWUPS_SHEET.submitter_primary_diagnosis_id)}
DONOR.INDEX.primary_diagnoses.INDEX.followups.INDEX.anatomic_site_progression_or_recurrence, {pipe_delim(FOLLOWUPS_SHEET.anatomic_site_progression_or_recurrence)}
DONOR.INDEX.primary_diagnoses.INDEX.followups.INDEX.date_of_followup, {single_date(FOLLOWUPS_SHEET.date_of_followup)}
DONOR.INDEX.primary_diagnoses.INDEX.followups.INDEX.date_of_relapse, {single_date(FOLLOWUPS_SHEET.date_of_relapse)}
DONOR.INDEX.primary_diagnoses.INDEX.followups.INDEX.date_of_followup, {date_interval(FOLLOWUPS_SHEET.date_of_followup)}
DONOR.INDEX.primary_diagnoses.INDEX.followups.INDEX.date_of_relapse, {date_interval(FOLLOWUPS_SHEET.date_of_relapse)}
DONOR.INDEX.primary_diagnoses.INDEX.followups.INDEX.disease_status_at_followup, {single_val(FOLLOWUPS_SHEET.disease_status_at_followup)}
DONOR.INDEX.primary_diagnoses.INDEX.followups.INDEX.method_of_progression_status, {pipe_delim(FOLLOWUPS_SHEET.method_of_progression_status)}
DONOR.INDEX.primary_diagnoses.INDEX.followups.INDEX.recurrence_m_category, {single_val(FOLLOWUPS_SHEET.recurrence_m_category)}
Expand Down Expand Up @@ -168,11 +168,11 @@ DONOR.INDEX.biomarkers.INDEX.submitter_follow_up_id, {single_val(BIOMARKERS_SHEE
DONOR.INDEX.biomarkers.INDEX.submitter_primary_diagnosis_id, {single_val(BIOMARKERS_SHEET.submitter_primary_diagnosis_id)}
DONOR.INDEX.biomarkers.INDEX.submitter_specimen_id, {single_val(BIOMARKERS_SHEET.submitter_specimen_id)}
DONOR.INDEX.biomarkers.INDEX.submitter_treatment_id, {single_val(BIOMARKERS_SHEET.submitter_treatment_id)}
DONOR.INDEX.biomarkers.INDEX.test_date, {single_date(BIOMARKERS_SHEET.test_date)}
DONOR.INDEX.biomarkers.INDEX.test_date, {date_interval(BIOMARKERS_SHEET.test_date)}
DONOR.INDEX.followups.INDEX, {moh_indexed_on_donor_if_others_absent(FOLLOWUPS_SHEET.submitter_donor_id, FOLLOWUPS_SHEET.submitter_primary_diagnosis_id, FOLLOWUPS_SHEET.submitter_treatment_id)}
DONOR.INDEX.followups.INDEX.anatomic_site_progression_or_recurrence, {pipe_delim(FOLLOWUPS_SHEET.anatomic_site_progression_or_recurrence)}
DONOR.INDEX.followups.INDEX.date_of_followup, {single_date(FOLLOWUPS_SHEET.date_of_followup)}
DONOR.INDEX.followups.INDEX.date_of_relapse, {single_date(FOLLOWUPS_SHEET.date_of_relapse)}
DONOR.INDEX.followups.INDEX.date_of_followup, {date_interval(FOLLOWUPS_SHEET.date_of_followup)}
DONOR.INDEX.followups.INDEX.date_of_relapse, {date_interval(FOLLOWUPS_SHEET.date_of_relapse)}
DONOR.INDEX.followups.INDEX.disease_status_at_followup, {single_val(FOLLOWUPS_SHEET.disease_status_at_followup)}
DONOR.INDEX.followups.INDEX.method_of_progression_status, {pipe_delim(FOLLOWUPS_SHEET.method_of_progression_status)}
DONOR.INDEX.followups.INDEX.recurrence_m_category, {single_val(FOLLOWUPS_SHEET.recurrence_m_category)}
Expand Down
Loading

0 comments on commit 815cbfc

Please sign in to comment.