Nextflow for single-cell RNAseq
Note: This pipeline is designed to be run post running CellRanger Count ( If the data meets basic QC from the CellRanger Websummary this pipeline will QC further and as well as run Seurat Preprocessing and Clustering.
This pipeline manages a scRNA-Seq workflow starting from raw fastq files and converting them to standard file formats for use by downstream tools. The steps involved are:
- DoubletFinder: Single-Cell Remover of Doublets.
- Add meta, will add the metadata from CellRanger Count and user provided samplesheet to .h5 file.
- Create QC Report of all Samples provided.
- Cell Clustering and Cell Type Annotation.
- Perform Trajectory Analysis.
This repository uses CellRanger Counts to generate the CellRangers outs directory that is used downstream: You can download it here: CellRanger:
You can Download CellRanger Software with this command:
wget -O cellranger-7.2.0.tar.gz ""
To download dependencies that were developed internally by BWH Bioinformatics and Genomics Hub Note: Currently it is much easier to have conda handle the packages.
To install nextflow:
git clone
cd nextflow-scRNAseq
cd nextflow
tar -xvzf nextflow-22.10.6.tar.gz
cd nextflow-22.10.6/
sudo apt install openjdk-17-jre # if on linux
Nextflow will be successfully installed
To create conda environment with dependencies installed:
cd nextflow-scRNAseq
cd env/
conda env create -f environment.yml
conda activate scrna_nextflow_pipeline
Rscript install_R_packages.R
pip install cellbender
cd ..
R # to activate R
# Leiden Algorithm Requirements
pip install leidenalg
pip install numpy
pip install pandas
nextflow/nextflow-22.10.6/nextflow run
To deal with software dependencies and version controling a dockerfile has been created.
To download docker image run
docker pull acicalo4/snscrnaseq:latest
To mount data from local host to docker container run, example:
docker run -t -i -v path/to/data/you/want/mounted:/container/dir acicalo4/snscrnaseq /bin/bash
Nextflow will parse a .csv file for the sample_ids and the path to the directory the fastq files are in for your project. Please provide at the minimum a sample_id column to the .csv file.
If working with a .xls/.xlsx file please create a .csv file called samples.csv with a column labeled == 'sample_id'
example: \
nextflow run -with-docker acicalo4/snscrnaseq