Skip to content
This repository has been archived by the owner on Oct 2, 2021. It is now read-only.

Latest commit

 

History

History
59 lines (37 loc) · 2.9 KB

README.md

File metadata and controls

59 lines (37 loc) · 2.9 KB

catastrophy-pipeline

Run the catastrophy fungal trophy classifier for many genomes.

Nextflow

The python package now bundles a pipeline that runs HMMER in parallel for you. This nextflow pipeline won't be maintained into the future.

Details of the new pipeline are here: https://github.com/ccdmb/catastrophy#using-the-catastrophy-pipeline

Introduction

This pipeline automates the process of running HMMER3 and CATAStrophy on many genomes.

The pipeline is built using Nextflow (version >=20.01.0), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

This documentation is a bit sparse right now. We're all time poor. If you are having trouble, please don't hesitate to raise an issue on github or email me.

Quick Start

i. Install nextflow

ii. Install one of docker, singularity or conda

iii. Download the pipeline and test it on a minimal dataset with a single command

nextflow run ccdmb/catastrophy-pipeline -profile test,<docker/singularity/conda>

iv. Start running your own analysis!

nextflow run ccdmb/catastrophy-pipeline -profile <docker/singularity/conda> --proteomes 'proteomes/*.fasta' --dbcan_version 8

Parameters

Parameter default description
--help flag Show help text and exit.
--proteomes File or glob of files. The proteins in fasta format that catastrophy should classify. Each file is treated as a separate organism.
--dbcan_version 4,5,6,7 or 8 The version of dbCAN to count CAZymes from. By default, will attempt to download the correct dbCAN database for this version and use that.
--dbcan File A copy of the dbCAN HMMER3 formatted database of CAZymes. If you use this option, you must also provide the --dbcan_version number.
--dbcan_url URL The url to download the dbCAN HMMER3 formatted database from. If you use this option, you must also provide the --dbcan_version number.
--outdir results The directory to write results to.

I highly recommend that you use the --dbcan_version flag without --dbcan or --dbcan_url. This will ensure that the correct model and database versions are used.

Nextflow can run tasks in parallel, I will add some documentation to this later. Generally, it should do some of this automatically but more advanced things like using supercomputers or distributed cloud setups are more complex.