R. Martín et al., “ONCOLINER: A new solution for monitoring, improving, and harmonizing somatic variant calling across genomic oncology centers,” Cell Genomics, vol. 4, no. 9. Elsevier BV, p. 100639, Sep. 2024. doi: 10.1016/j.xgen.2024.100639
This repository contains the scripts to run the variant callers used originally in ONCOLINER. The variant callers are executed from Bash scripts that use Singularity containers. The scripts are located in the executable_scripts/
folder of this repository. The containers references are available in the variant callers list below.
The scripts for running the variant callers are Bash scripts that can be executed directly from the command line in almost any Unix-based system. The only dependency is Singularity (singularity-ce
version +3.9.0). The scripts are optimized for running in HPC environments without root privileges.
Variant caller | Variant types | Version | Singularity containers | License | Notes |
---|---|---|---|---|---|
cgpCaVEManWrapper | SNV | 1.6.0 | oncoliner_cgpcavemanwrapper:1.16.0 |
AGPL-3.0 | cgpPindel must be executed first |
MuSE | SNV | 2.0 | oncoliner_muse:2.0 |
GPL-2.0 | Does not support CRAM |
Shimmer | SNV | oncoliner_shimmer:latest |
Custom | Does not support CRAM | |
Mutect2 (from GATK) | SNV/Indel | 4.2.6.1 | oncoliner_gatk:4.2.6.1 |
Apache 2.0 | |
SAGE | SNV/Indel | 3.0 | oncoliner_sage:3.0 |
GPL-3.0 | |
Strelka2 | SNV/Indel | 2.9.10 | oncoliner_strelka:2.9.10 |
GPL-3.0 | |
cgpPindel | Indel | 3.9.0 | oncoliner_cgppindel:3.9.0 |
AGPL-3.0 | |
SvABA | Indel/SV | 1.1.0 | oncoliner_svaba:1.1.0 |
GPL-3.0 | |
BRASS | SV | 6.3.4 | oncoliner_brass:6.3.4 |
AGPL-3.0 | |
Delly | SV | 1.1.6 | oncoliner_delly:1.1.6 |
BSD-3 | |
GRIDSS2 (with GRIPSS) | SV | 2.13.2 | oncoliner_gridss:2.13.2 / GRIPSS JAR |
GPL-3.0 | |
Manta | SV | 1.6.0 | oncoliner_manta:1.6.0 |
GPL-3.0 |
Downloading Singularity containers (using ORAS) does not require root privileges. For downloading any of the Singularity containers provided in this repository, you can use the following command:
singularity pull <variant_caller_name_version>.sif oras://ghcr.io/eucancan/<container_name:tag>
It is important that the container is named after the script that executes it. For example, the script executable_scripts/muse_2_0.sh
requires the singularity container to be named muse_2_0.sif
.
WARNING. Your institution may not allow you to download files directly from computing nodes. If that is the case, you will need to download the container in a different machine and then copy it to the computing node. For example, you could download the container in your local machine and then copy it to the computing node using scp
:
scp <variant_caller_name_version>.sif <username>@<hostname>:<path_to_singularity_containers_storage_dir>
Running Singularity containers does not require root privileges. All the scripts to execute the variant callers are located in the executable_scripts/
folder of this repository. The scripts are named after the variant caller they execute and its version. For example, the script to execute MuSE v2.0 is located in executable_scripts/muse_2.0.sh
.
All the scripts require the following parameters to be passed in the following order:
$WORKING_DIR # path to working directory
$OUTPUT_DIR # path to output directory
$EXTRA_DATA_DIR # path to extra data directory
$REF_VERSION # reference version (i.e. 37)
$NORMAL_SAMPLE # path to normal sample SAM/BAM/CRAM file
$TUMOR_SAMPLE # path to tumor sample SAM/BAM/CRAM file
$FASTA_REF # path to reference FASTA file
$NUM_CORES # number of cores to use
$MAX_MEMORY # maximum memory to use (in GB) (i.e 8)
Some variant callers require extra data to be executed. The extra data required by each variant caller is available in the required_extra_data/
folder of this repository. If you were running the variant caller from the root of this repository, you could use the following command to set the $EXTRA_DATA_DIR
environment variable:
export EXTRA_DATA_DIR=required_extra_data
Note: Due to size limitations, some files are not available in this repository and need to be downloaded from external sources. For these cases, a file with the same name but ending with .download
will be present instead. This file contains the instructions and links to download the file.
The following example shows how to execute any of the variant callers from the root of this repository:
WORKING_DIR=/path/to/working/directory
OUTPUT_DIR=/path/to/output/directory
EXTRA_DATA_DIR=./required_extra_data
REF_VERSION=37
NORMAL_SAMPLE=/path/to/normal/sample.bam
TUMOR_SAMPLE=/path/to/tumor/sample.bam
FASTA_REF=/path/to/reference.fasta
NUM_CORES=8
MAX_MEMORY=32
singularity exec -e <SINGULARITY_CONTAINER> bash ./executable_scripts/variant_caller_X_X_X.sh $WORKING_DIR $OUTPUT_DIR $EXTRA_DATA_DIR $REF_VERSION $NORMAL_SAMPLE $TUMOR_SAMPLE $FASTA_REF $NUM_CORES $MAX_MEMORY
# The above command might not work in some HPC environments. In that case, you can use the following command instead:
singularity exec -c --bind $WORKING_DIR,$OUTPUT_DIR,$EXTRA_DATA_DIR,$(dirname $NORMAL_SAMPLE),$(dirname $TUMOR_SAMPLE),$(dirname $FASTA_REF) <SINGULARITY_CONTAINER> bash ./executable_scripts/variant_caller_X_X_X.sh $WORKING_DIR $OUTPUT_DIR $EXTRA_DATA_DIR $REF_VERSION $NORMAL_SAMPLE $TUMOR_SAMPLE $FASTA_REF $NUM_CORES $MAX_MEMORY