Skip to content

Restructured pipeline

Compare
Choose a tag to compare
@valscherz valscherz released this 11 May 10:34
· 638 commits to master since this release

This release if the first since complete restructuring of the pipeline:

  • A Singularity container is now defined for each rule, instead of having a big and hard to maintain container as previously.
  • Pipeline preprocessing and formatting for the different classifiers is now in a dedicated worfklow.
  • Original RDP is now implemented, with pretraining of the classifier during DBprocessing and optional rules to validate its performance.
  • Reference databases were removed of this git repository (were too big). Their location (path) and name (directory at this path) must now be defined in config.
  • Cutadapt settings were changed to filter out sequences without a primer at the expected position. This was implemented to avoid the inclusion of sequences in the wrong direction which would mess with phylogeny.

Traceability is now improved by :

  • The recovery of the current commit and user in logs, including for DB processing.
  • Hash generated during DB processing

Future improvements:

  • Check for reference DB hash before tax assignment.
  • Set the Shinyapp in a container.
  • Generate a report to QC the whole process.
  • Document with Read the Doc.
  • Implement Travis to validate releases.