Skip to content

Dependencies and Install

Jill V. Hagey, PhD edited this page Jun 5, 2024 · 51 revisions

Quick Start

  1. Install Nextflow (>=21.10.3,<24.01.0).

    There are several options for install if you do not already have it on your system:

    • Install into conda environment, which will require a version of Anaconda to be installed on your system.

      mamba create -n nextflow -c bioconda nextflow=21.10.6  
    • If you prefer a to use curl or wget for install see the Nextflow Documentaiton

  2. Install Docker or Singularity >=3.8.0 for full pipeline reproducibility.

  3. Download kraken database that is required for the kraken2 subworkflow of PHoeNIx.

  • For PHoeNIx >=1.1.1 you will need to download the public Standard-8 version kraken2 database created on May 17, 2021 from Ben Langmead's github page. The download link is https://genome-idx.s3.amazonaws.com/kraken/k2_standard_8gb_20210517.tar.gz.

  • For PHoeNIx >=2.0.0 you will need to download the public Standard-8 version kraken2 database created on or after March 14th, 2023 from Ben Langmead's github page. You CANNOT use an older version of the public kraken databases on Ben Langmead's github page. We thank @BenLangmead and @jenniferlu717 for taking the time to include an extra file in public kraken databases created after March 14th, 2023 to allow them to work in PHoeNIx!

  1. (optional) If you installed nextflow via a conda environment activate the nextflow environment with:

    conda activate nextflow
  2. Run PHoeNIx on a test sample loaded with the package with a single command:

    nextflow run cdcgov/phoenix -r v1.0.0 -profile <singularity/docker/custom>,test -entry PHOENIX --kraken2db $PATH_TO_DB

Note that this command clones (downloads) the repo to ~/.nextflow/assets/cdcgov/phoenix. See below for how to clone and have the software downloaded to a different location.

> * The pipeline comes with config profiles called `docker` and `singularity` which instruct the pipeline to use the named tool for software management. For example, `-profile test,docker`.
> * Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment.
  1. Start running your own analysis with a samplesheet!

    nextflow run cdcgov/phoenix -r v1.0.0 -profile <singularity/docker/custom> -entry PHOENIX --input <path_to_samplesheet.csv> --kraken2db $PATH_TO_DB

All the Details

Install and Set up Dependencies

1. Install Nextflow (>=21.10.3).

There are several options for install if you do not already have it on your system:

  • Use curl or wget for install see the Nextflow Documentation

  • A good way to install Nextflow is with conda or mamba. Mamba is much faster so we would recommend that. This will require installation of Anaconda first. A short tutorial on Anaconda and its set up can be found [here] (https://jvhagey.github.io/Tutorials/mydoc_Installation.html).

    If you need mamba installed and you already have anaconda on your system run:

    conda install -c conda-forge mamba

    To install Nextflow run:

    mamba create -n nextflow -c conda-forge -c bioconda nextflow=21.10.6  

    Then you can activate the environment with:

    conda activate nextflow
    - You will run PHoeNIx from inside this environment!

2. Configuring Nextflow for command-line interface (CLI) users

Configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile. You can chain multiple config profiles in a comma-separated string.

  • The pipeline comes with config profiles called docker and singularity which instruct the pipeline to use the named tool for software management. For example, -profile test,docker.
  • Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use -profile <institute> in your command. This will enable either docker or singularity and set the appropriate execution settings for your local compute environment.
  • If you are using singularity and are persistently observing issues downloading Singularity images directly due to timeout or network issues, you can use the --singularity_pull_docker_container parameter to pull and convert the Docker image instead. Alternatively, you can use the nf-core download command to download images first, before running the pipeline. Setting the NXF_SINGULARITY_CACHEDIR or singularity.cacheDir Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.
    • To add NXF_SINGULARITY_CACHEDIR to your bash profile run the following:
      1. Open your ~/.bash_profile by running nano ~/.bash_profile or some other text editor that isn't nano.
      2. Inside the ~/.bash_profile add the following lines
      export NXF_SINGULARITY_CACHEDIR=/$PATH/Singularity_Containers
      export PATH
      1. Here $PATH is the full path to where you want to store the folder. You can name Singularity_Containers folder whatever you want. You will need to restart your terminal or run source ~/.bash_profile to allow nextflow to see the new path.

3. Installing Container Software

Install Docker or Singularity

3. Download Database files for Kraken2

  • For PHoeNIx <=1.1.1 you will need to download the public Standard-8 version kraken2 database created on May 17, 2021 from Ben Langmead's github page. The download link is https://genome-idx.s3.amazonaws.com/kraken/k2_standard_8gb_20210517.tar.gz.

  • For PHoeNIx >=2.0.0 you will need to download the public Standard-8 version kraken2 database created on or after March 14th, 2023 from Ben Langmead's github page. You CANNOT use an older version of the public kraken databases on Ben Langmead's github page. We thank @BenLangmead and @jenniferlu717 for taking the time to include an extra file in public kraken databases created after March 14th, 2023 to allow them to work in PHoeNIx!

Run PHoeNIx

To run PHoeNIx there are two options the difference being where you want it installed:

  1. Install the latest version via cloning PHoeNIx github repo into a folder of your choosing:

    cd $PATH_TO_INSTALL
    git clone https://github.com/CDCgov/phoenix 

    If you want to run a particular version then you can download that using the -b argument like this:

    git clone -b v1.0.0 https://github.com/CDCgov/phoenix 

    Then you can run it (make sure you activate your conda environment first, if that is how nextflow is installed!):

    nextflow run $PATH_TO_INSTALL/phoenix/main.nf -entry PHOENIX -profile <singularity/docker/custom> --input <path_to_samplesheet.csv> --kraken2db $PATH_TO_DB
  2. Alternatively, PHoeNIx run directly (will download to ~/.nextflow/assests/cdcgov/phoenix):

    nextflow run cdcgov/phoenix -r v1.0.0 -entry PHOENIX -profile <singularity/docker/custom> --input <path_to_samplesheet.csv> --kraken2db $PATH_TO_DB

    Running PHoeNIx this way means it will just pull the version specified with -r on github to run and it will be installed into ~/.nextflow/assets/cdcgov/phoenix.

Testing Install

To test that the pipeline was installed and configured correctly run the following by running either:

nextflow run phoenix/main.nf -profile test,<singularity/docker/custom> -entry PHOENIX --kraken2db $PATH_TO_DB

or

nextflow run cdcgov/phoenix -r v1.0.0 -profile test,<singularity/docker/custom> -entry PHOENIX --kraken2db $PATH_TO_DB

This command will run the pipeline on preloaded data. If all goes well you should see some output that looks like this:

run output

As you can see from the screenshot this takes ~18mins to run 🐢. This is because the test is limited to 2 cpus. If you want to speed it up into 🐇 mode go into phoenix/conf/test.config and increase the max_cpus parameter and save the file before running. Notice there are some steps that aren't run in this pipeline, specifically some SPADES_WF stats, these will only run if you have a sample fail SPAdes where contigs are created, but not scaffolds, so the this behavior is normal.

Clone this wiki locally