The repository serves as a resource for researchers and data coordination team to harmonize clinical metadata collection, utilizing existing ontologies and standards, provide an extensible interoperable framework for ease of data sharing.
The Schema framework is divided into three parts and defined as follows:
Term | Definition |
---|---|
Base | Contains common data fields shared across all domains (e.g., patient demographics, vital status, laboratory results). |
Defined and maintained by PCGL. | |
Drive the Research portal for data exploration to ensure all users interact with a consistent set of data elements and enhance the data interoperability | |
Extension | Include domain-specific data fields unique to each study or disease area. |
Collaboratively developed by individual Program/Study according to guidelines and templates provided by PCGL | |
Extend the base schema to meet the precise needs of each study without affecting the base schema | |
Custom | The result of merging the base schema and the extension schema. |
Represents the complete schema used by a program or study. | |
Register to Lectern (Schema Registry) | |
Stored, managed and versioned by Lectern |
** INSERT Extensible schema DIAGRAM HERE **
Within Base, Extension and Custom are Entities that represent objects within the schema and serve as the basis for information collection. Types of entities include : participant, sample, treatment, etc...
Entities will contain fields which serve to collect a specific type of information for example a status, metric, a measurement or ID.
** INSERT ER DIAGRAM HERE **
The schemas are coded in linkML format. The following are reasons for utilizing linkML
- schemas can be used with DataHarmonizer, a browser spreadsheet editor locally and offline
- Data can validated through command line command locally and offline
- linkML supports object like inheritance
- Supports mapping for establish onotologies.
Both are overture products where lectern manages schemas while lyric manages data ingestion and validation.
Lectern utilizes a custom JSON formatted syntax that requires conversion from linkML format to Lectern accepted. We keep schemas in linkML format due to the previously mentioned strengths. For more details on downsides see restrictions/README.md
.
Folder | Purpose |
---|---|
Base | Contains YAML files of base entities |
Extension | Sub-divided per project, contains YAML files that extend base entities |
Custom | Sub-divided per project, contains 3 YAMLs. See README.md within folder for more details |
Scripts | Scripts for aggregating schemas and exporting into various types. See README.md within folder for more details |
Lectern | Sub-divided per project,JSON schema files containing aggregated entities into a signle schema. See README.md within folder for more details |
Restrictions | Sub-divided per project,JSON schema files containing specialized restrictions for entities. |
Test_data | Sub-divided per project, contains examples of good and bad data for testing. |
CSV | Sub-divided per project, contains the flattened CSV version of custom YAML |
DataHarmonizer | Sub-divided per project, contains the zip packaged dataharmonizer for local offline validation. |
Typescript_export | Sub-divided per project, contains the export typescript used for data harmonizer. |
Update to any of the following schema will require a full regeneration of resource:
- Base Schema (e.g.
base/participant.yaml
) - Extension Schema (e.g.
extension/example/participant.yaml
)
- Update Custom Schema (e.g.
extension/custom/participant.yaml
) - Use
scripts/generateCustomLinkmlFromReference.py
to generateextension/custom/example_dh.yaml
andextension/custom/example_full.yaml
- Use
scripts/generateFlatCsvFromFullLinkml.py
andextension/custom/example_full.yaml
to generatecsv/example/example.yaml
- Use
scripts/generateLecternJsonFromCustomLinkml
andextension/custom/example_full.yaml
to generatelectern/example/example.json
- Register
lectern/example/example.json
in lectern per project - Register Lectern provided IDs in Lyric
- Pull latest version of https://github.com/cidgoh/DataHarmonizer locally
- Run
scripts/dh-validate.py
fromDataHarmonizer
folder onextension/custom/example_dh.yaml
to generateweb/templates/examples/schema.json
- Copy
typescript_export/example/export.js
toweb/templates/examples
- Compress folder and copy over to
dataHarmonizer/example/example.tar.gz