Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ganon custombuild #55

Merged
merged 11 commits into from
Nov 28, 2024
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@

> Lu, J., Breitwieser, F. P., Thielen, P., & Salzberg, S. L. (2017). Bracken: estimating species abundance in metagenomics data. PeerJ. Computer Science, 3(e104), e104. https://doi.org/10.7717/peerj-cs.104

- [ganon](https://doi.org/10.1093/bioinformatics/btaa458)

> Piro, V. C., Dadi, T. H., Seiler, E., Reinert, K., & Renard, B. Y. (2020). Ganon: Precise metagenomics classification against large and up-to-date sets of reference sequences. Bioinformatics (Oxford, England), 36(Suppl_1), i12–i20. https://doi.org/10.1093/bioinformatics/btaa458

- [Centrifuge](https://doi.org/10.1101/gr.210641.116)

> Kim, D., Song, L., Breitwieser, F. P., & Salzberg, S. L. (2016). Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Research, 26(12), 1721–1729. https://doi.org/10.1101/gr.210641.116
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
2. Builds databases for:
- [Bracken](https://doi.org/10.7717/peerj-cs.104)
- [Centrifuge](https://doi.org/10.1101/gr.210641.116)
- [ganon](https://doi.org/10.1093/bioinformatics/btaa458)
- [DIAMOND](https://doi.org/10.1038/nmeth.3176)
- [Kaiju](https://doi.org/10.1038/ncomms11257)
- [Kraken2](https://doi.org/10.1186/s13059-019-1891-0)
Expand Down
8 changes: 6 additions & 2 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]

withName: 'MULTIQC' {
ext.args = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' }
withName: MULTIQC {
ext.args = { params.multiqc_title ? "--title \"${params.multiqc_title}\"" : '' }
publishDir = [
path: { "${params.outdir}/multiqc" },
mode: params.publish_dir_mode,
Expand Down Expand Up @@ -47,6 +47,10 @@ process {
]
}

withName: GANON_BUILD {
ext.args = { "--verbose" }
}

withName: MALT_BUILD {
ext.args = { "--sequenceType ${params.malt_sequencetype}" }
}
Expand Down
1 change: 1 addition & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ params {

build_bracken = true
build_diamond = true
build_ganon = true
build_kaiju = true
build_malt = true
build_centrifuge = true
Expand Down
17 changes: 9 additions & 8 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,14 @@ params {
// Input data for full size test
// TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
// TODO nf-core: Give any required params for the test so that command line flags are not needed
input = params.pipelines_testdata_base_path + 'viralrecon/samplesheet/samplesheet_full_illumina_amplicon.csv'
input = params.pipelines_testdata_base_path + 'viralrecon/samplesheet/samplesheet_full_illumina_amplicon.csv'

build_bracken = true
build_diamond = true
build_kaiju = true
build_malt = true
build_centrifuge = true
build_kraken2 = true
build_krakenuniq = true
build_bracken = true
build_diamond = true
build_ganon = true
build_kaiju = true
build_malt = true
build_centrifuge = true
build_kraken2 = true
build_krakenuniq = true
}
1 change: 1 addition & 0 deletions conf/test_nothing.config
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ params {

build_bracken = false
build_diamond = false
build_ganon = false
build_kaiju = false
build_malt = false
build_centrifuge = false
Expand Down
17 changes: 16 additions & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
- [Bracken](#bracken) - Database files for Brakcen
- [Bracken](#bracken) - Database files for Bracken
- [ganon](#ganon) - Database files for ganon
- [Centrifuge](#centrifuge) - Database files for Centrifuge
- [DIAMOND](#diamond) - Database files for DIAMOND
- [Kaiju](#kaiju) - Database files for Kaiju
Expand Down Expand Up @@ -92,6 +93,20 @@ The resulting `<db_name>/` directory can be given to Bracken itself with `bracke

A directory and `cf` files can be given to the Centrifuge command with `centrifuge -x /<path>/<to>/<cf_files_basename>` etc.

### Ganon

[ganon](https://github.com/pirovc/ganon/) classifies genomic sequences against large sets of references efficiently, with integrated download and update of databases (refseq/genbank), taxonomic profiling (ncbi/gtdb), binning and hierarchical classification, customized reporting and more.

<details markdown="1">
<summary>Output files</summary>

- `diamond/`
- `<database>.hibf`: main bloom filter index file
- `<database>.tax`: taxonomy tree used for taxonomy assignment
</details>

The directory containing these two files can be given to ganon itself with using the name as a prefix, e.g., `ganon classify -d /<path>/<to>/<database name without extensions>`.

### Diamond

[DIAMOND](https://github.com/bbuchfink/diamond) is a accelerated BLAST compatible local sequence aligner particularly used for protein alignment.
Expand Down
89 changes: 40 additions & 49 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -15,64 +15,20 @@
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

include { CREATETAXDB } from './workflows/createtaxdb'
include { CREATETAXDB } from './workflows/createtaxdb'
include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_createtaxdb_pipeline'
include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_createtaxdb_pipeline'

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NAMED WORKFLOWS FOR PIPELINE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

//
// WORKFLOW: Run main analysis pipeline depending on type of input
//
workflow NFCORE_CREATETAXDB {

take:
samplesheet // channel: samplesheet read in from --input

main:

//
// WORKFLOW: Run pipeline
//
ch_samplesheet = samplesheet
ch_taxonomy_namesdmp = file(params.namesdmp)
ch_taxonomy_nodesdmp = file(params.nodesdmp)
ch_accession2taxid = file(params.accession2taxid)
ch_nucl2taxid = file(params.nucl2taxid)
ch_prot2taxid = file(params.prot2taxid)
ch_malt_mapdb = file(params.malt_mapdb)


CREATETAXDB (
ch_samplesheet,
ch_taxonomy_namesdmp,
ch_taxonomy_nodesdmp,
ch_accession2taxid,
ch_nucl2taxid,
ch_prot2taxid,
ch_malt_mapdb,

)
emit:
multiqc_report = CREATETAXDB.out.multiqc_report // channel: /path/to/multiqc_report.html
}
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RUN MAIN WORKFLOW
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

workflow {

main:
//
// SUBWORKFLOW: Run initialisation tasks
//
PIPELINE_INITIALISATION (
PIPELINE_INITIALISATION(
params.version,
params.validate_params,
params.monochrome_logs,
Expand All @@ -84,13 +40,13 @@ workflow {
//
// WORKFLOW: Run main workflow
//
NFCORE_CREATETAXDB (
NFCORE_CREATETAXDB(
PIPELINE_INITIALISATION.out.samplesheet
)
//
// SUBWORKFLOW: Run completion tasks
//
PIPELINE_COMPLETION (
PIPELINE_COMPLETION(
params.email,
params.email_on_fail,
params.plaintext_email,
Expand All @@ -103,6 +59,41 @@ workflow {

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
THE END
NAMED WORKFLOWS FOR PIPELINE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

//
// WORKFLOW: Run main analysis pipeline depending on type of input
//
workflow NFCORE_CREATETAXDB {
take:
samplesheet // channel: samplesheet read in from --input

main:

//
// WORKFLOW: Run pipeline
//
ch_samplesheet = samplesheet
ch_taxonomy_namesdmp = file(params.namesdmp, checkIfExists: true)
ch_taxonomy_nodesdmp = file(params.nodesdmp, checkIfExists: true)
ch_accession2taxid = file(params.accession2taxid, checkIfExists: true)
ch_nucl2taxid = file(params.nucl2taxid, checkIfExists: true)
ch_prot2taxid = file(params.prot2taxid, checkIfExists: true)
ch_malt_mapdb = file(params.malt_mapdb, checkIfExists: true)


CREATETAXDB(
ch_samplesheet,
ch_taxonomy_namesdmp,
ch_taxonomy_nodesdmp,
ch_accession2taxid,
ch_nucl2taxid,
ch_prot2taxid,
ch_malt_mapdb
)

emit:
multiqc_report = CREATETAXDB.out.multiqc_report // channel: /path/to/multiqc_report.html
}
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,11 @@
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"ganon/buildcustom": {
"branch": "master",
"git_sha": "4265ef4b3b9af8877671715b081f102041c64cfd",
"installed_by": ["modules"]
},
"gunzip": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
Expand Down
5 changes: 5 additions & 0 deletions modules/nf-core/ganon/buildcustom/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

60 changes: 60 additions & 0 deletions modules/nf-core/ganon/buildcustom/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

77 changes: 77 additions & 0 deletions modules/nf-core/ganon/buildcustom/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading