Skip to content

Commit

Permalink
Merge pull request #1613 from lindenb/pl_goleft
Browse files Browse the repository at this point in the history
added indexcov : finding large INDEL using the BAI index
  • Loading branch information
FriederikeHanssen authored Dec 10, 2024
2 parents 6f3e673 + 6dc5f99 commit 5c6be78
Show file tree
Hide file tree
Showing 24 changed files with 901 additions and 130 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ A set of connecting glaciers.

### Added

- [1613](https://github.com/nf-core/sarek/pull/1613) - add indexcov
- [1638](https://github.com/nf-core/sarek/pull/1638) - Added additional documentation detailing ASCAT WES usage.
- [1640](https://github.com/nf-core/sarek/pull/1620) - Add `lofreq` as a tumor-only variant caller
- [1642](https://github.com/nf-core/sarek/pull/1642) - Back to dev
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ Depending on the options and samples provided, the pipeline can currently perfor
- `freebayes`
- `GATK HaplotypeCaller`
- `Manta`
- `indexcov`
- `mpileup`
- `MSIsensor-pro`
- `Mutect2`
Expand Down Expand Up @@ -171,6 +172,7 @@ We thank the following people for their extensive assistance in the development
- [pallolason](https://github.com/pallolason)
- [Paul Cantalupo](https://github.com/pcantalupo)
- [Phil Ewels](https://github.com/ewels)
- [Pierre Lindenbaum](https://github.com/lindenb)
- [Sabrina Krakau](https://github.com/skrakau)
- [Sam Minot](https://github.com/sminot)
- [Sebastian-D](https://github.com/Sebastian-D)
Expand Down
21 changes: 21 additions & 0 deletions conf/modules/indexcov.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@

// INDEXCOV

process {
if (params.tools && params.tools.split(',').contains('indexcov')) {

withName: 'SAMTOOLS_REINDEX_BAM' {
ext.args = { ' -F 3844 -q 30 ' } // high mapq , primary read paired properly mapped
}

withName: 'GOLEFT_INDEXCOV' {
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/indexcov/" }
]

}

}

}
Binary file modified docs/images/sarek_subway.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
233 changes: 145 additions & 88 deletions docs/images/sarek_subway.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 25 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Strelka](#strelka)
- [Lofreq](#lofreq)
- [Structural Variants](#structural-variants)
- [Indexcov](#indexcov)
- [Manta](#manta)
- [TIDDIT](#tiddit)
- [Sample heterogeneity, ploidy and CNVs](#sample-heterogeneity-ploidy-and-cnvs)
Expand Down Expand Up @@ -592,6 +593,30 @@ For further downstream analysis, take a look [here](https://github.com/Illumina/

### Structural Variants

#### indexcov

[indexcov](https://github.com/brentp/goleft/tree/master/indexcov) quickly estimate coverage from a whole-genome bam or cram index.
A bam index has 16KB resolution and it is used as a coverage estimate .
The output is scaled to around 1. So a long stretch with values of 1.5 would be a heterozygous duplication. This is useful as a quick QC to get coverage values across the genome.

**Output directory: `{outdir}/variantcalling/indexcov/`**

In addition to the interactive HTML files, `indexcov` outputs a number of text files:

- `<sample>-indexcov.ped`: a .ped/.fam file with the inferred sex in the appropriate column if the sex chromosomes were found.
the CNX and CNY columns indicating the floating-point estimate of copy-number for those chromosomes.
`bins.out`: how many bins had a coverage value outside of (0.85, 1.15). high values can indicate high-bias samples.
`bins.lo`: number of bins with value < 0.15. high values indicate missing data.
`bins.hi`: number of bins with value > 1.15.
`bins.in`: number of bins with value inside of (0.85, 1.15)
`p.out`: `bins.out/bins.in`
`PC1...PC5`: PCA projections calculated with depth of autosomes.

- `<sample>-indexcov.roc`: tab-delimited columns of chrom, scaled coverage cutoff, and $n_samples columns where each indicates the
proportion of 16KB blocks at or above that scaled coverage value.
- `<sample>-indexcov.bed.gz`: a bed file with columns of chrom, start, end, and a column per sample where the values indicate there
scaled coverage for that sample in that 16KB chunk.

#### Manta

[Manta](https://github.com/Illumina/manta) calls structural variants (SVs) and indels from mapped paired-end sequencing reads.
Expand Down
49 changes: 25 additions & 24 deletions docs/usage.md

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,11 @@
"git_sha": "97321eded31a12598837a476d3615300af413bb7",
"installed_by": ["modules"]
},
"goleft/indexcov": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"lofreq/callparallel": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
Expand Down
6 changes: 6 additions & 0 deletions modules/local/samtools/reindex_bam/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
channels:
- conda-forge
- bioconda
dependencies:
- bioconda::samtools=1.20
- bioconda::htslib=1.20
57 changes: 57 additions & 0 deletions modules/local/samtools/reindex_bam/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
/**
* The aim of this process is to re-index the bam file without the duplicate, supplementary, unmapped etc, for goleft/indexcov
* It creates a BAM containing only a header (so indexcov can get the sample name) and a BAM index were low quality reads, supplementary etc, have been removed
*/
process SAMTOOLS_REINDEX_BAM {
tag "$meta.id"
label 'process_low'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/samtools:1.20--h50ea8bc_0' :
'biocontainers/samtools:1.20--h50ea8bc_0' }"

input:
tuple val(meta), path(input), path(input_index)
tuple val(meta2), path(fasta)
tuple val(meta3), path(fai)

output:
tuple val(meta), path("${meta.id}.reindex.bam"), path("${meta.id}.reindex.bam.bai"),emit: output
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def reference = fasta ? "--reference ${fasta}" : ""
"""
# write header only
samtools \\
view \\
--header-only \\
--threads ${task.cpus} \\
-O BAM \\
-o "${meta.id}.reindex.bam" \\
${reference} \\
${input}
# write BAM index only, remove unmapped, supplementary, etc...
samtools \\
view \\
--uncompressed \\
--write-index \\
--threads ${task.cpus} \\
-O BAM \\
-o "/dev/null##idx##${meta.id}.reindex.bam.bai" \\
${reference} \\
${args} \\
${input}
cat <<-END_VERSIONS > versions.yml
"${task.process}":
samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
}
6 changes: 6 additions & 0 deletions modules/nf-core/goleft/indexcov/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

65 changes: 65 additions & 0 deletions modules/nf-core/goleft/indexcov/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

122 changes: 122 additions & 0 deletions modules/nf-core/goleft/indexcov/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 5c6be78

Please sign in to comment.