Restructured pipeline
This release if the first since complete restructuring of the pipeline:
- A Singularity container is now defined for each rule, instead of having a big and hard to maintain container as previously.
- Pipeline preprocessing and formatting for the different classifiers is now in a dedicated worfklow.
- Original RDP is now implemented, with pretraining of the classifier during DBprocessing and optional rules to validate its performance.
- Reference databases were removed of this git repository (were too big). Their location (path) and name (directory at this path) must now be defined in config.
- Cutadapt settings were changed to filter out sequences without a primer at the expected position. This was implemented to avoid the inclusion of sequences in the wrong direction which would mess with phylogeny.
Traceability is now improved by :
- The recovery of the current commit and user in logs, including for DB processing.
- Hash generated during DB processing
Future improvements:
- Check for reference DB hash before tax assignment.
- Set the Shinyapp in a container.
- Generate a report to QC the whole process.
- Document with Read the Doc.
- Implement Travis to validate releases.