Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allan's changes to nf-core #1

Draft
wants to merge 68 commits into
base: nfcore_master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
658805e
changed runtimes in conf
Sanger-ad7 Oct 19, 2022
cdd0929
increased job timeout
Sanger-ad7 Oct 25, 2022
e457e5e
Added Reference dict to schema
Sanger-ad7 Oct 25, 2022
fc4e2fe
Removed dict from schema
Sanger-ad7 Oct 25, 2022
1e39764
added singularity settings to config
Sanger-ad7 Dec 2, 2022
1e188be
added singularity cacheDir to config
Sanger-ad7 Dec 2, 2022
2b1c59c
added LSF, errorStrategy, cache and Retry setting to base.config
Sanger-ad7 Dec 2, 2022
7ec0642
Commented out time from config
Sanger-ad7 Dec 5, 2022
dcd9784
Commented out process_long from config
Sanger-ad7 Dec 5, 2022
ace8a64
Added java setting for serial GC
Sanger-ad7 Dec 5, 2022
f4aa365
Changed GATK4processes to use 1 cpu and 8G mem initially
Sanger-ad7 Dec 5, 2022
0e9f3cc
changed config to run importgvcf and genotypegvcf in baasement
Sanger-ad7 Dec 19, 2022
4d230df
Altered genomicdbimport for running on node /tmp
Sanger-ad7 Jan 5, 2023
fe1a884
escaped some $ signs
Sanger-ad7 Mar 23, 2023
9827477
Modified genomicsdbimport module
Sanger-ad7 Mar 23, 2023
b709d2c
Added /tmp to singulartiy bind
Sanger-ad7 Mar 23, 2023
d6816df
removed tmp-dir from command
Sanger-ad7 Mar 23, 2023
5756e83
Altered modules to use genDB on /tmp
Sanger-ad7 Mar 24, 2023
4f805e8
Fixed genotypegvcfs module
Sanger-ad7 Mar 24, 2023
3dedf87
Changed gendb name
Sanger-ad7 Mar 24, 2023
ca8c1e3
Corrected --variant parameter
Sanger-ad7 Mar 24, 2023
179149c
Updated vcftools singularity image
Sanger-ad7 Mar 24, 2023
e2d3362
Turned off vcftools due to error
Sanger-ad7 Mar 24, 2023
6d775ad
changed config back
Sanger-ad7 Mar 24, 2023
11b4bf6
Tried to fix vcftools countTsTv non-zero exit
Sanger-ad7 Mar 27, 2023
8b825b6
Tried to fix vcftools countTsTv non-zero exit again
Sanger-ad7 Mar 27, 2023
ba4e1fe
Modified schema for dict file, optimised bed intervals
Sanger-ad7 Mar 29, 2023
35a0778
Added a few more settings for Allele Specific VQSR
Sanger-ad7 Mar 29, 2023
4696dff
changed schema back to before
Sanger-ad7 Mar 29, 2023
2be7c16
changed schema to show dict
Sanger-ad7 Mar 29, 2023
932e23b
added \ to $'s added
Sanger-ad7 Mar 29, 2023
cbff3d0
Changed deepvariant to low requirements
Sanger-ad7 Mar 30, 2023
300b63d
Added lsf.config
Sanger-ad7 Jul 10, 2023
1ccb52e
Altered schema
Sanger-ad7 Jul 11, 2023
7f56b81
Altered schema
Sanger-ad7 Jul 11, 2023
7f3a474
Added Ref files to test_full_germline.config
Sanger-ad7 Jul 11, 2023
fd802c8
Added for config settings to test_full_germline profile
Sanger-ad7 Jul 11, 2023
95a93b9
Added description of shard calculation
Sanger-ad7 Jul 11, 2023
f3413d6
Added description of intervals to use
Sanger-ad7 Jul 11, 2023
4a2bc15
Added description of intervals to use
Sanger-ad7 Jul 11, 2023
5ffab0a
Corrected format in lsf.conf
Sanger-ad7 Jul 11, 2023
4ae842e
Made joint call default
Sanger-ad7 Jul 11, 2023
6f67ddb
Made joint call default
Sanger-ad7 Jul 11, 2023
7b9698c
Made joint call default
Sanger-ad7 Jul 11, 2023
103ff6b
Made joint call default
Sanger-ad7 Jul 11, 2023
00fe590
Made joint call default
Sanger-ad7 Jul 11, 2023
ad57276
schema change
Sanger-ad7 Jul 12, 2023
89f7281
schema change
Sanger-ad7 Jul 12, 2023
4b4f6ef
changed nucleotides per second default
Sanger-ad7 Jul 12, 2023
1f102af
changed nucleotides per second default
Sanger-ad7 Jul 12, 2023
2fcc82d
changed nucleotides per second default in test_full_germline.config
Sanger-ad7 Jul 12, 2023
9c2ac27
Publish GVCF while joint calling
Sanger-ad7 Sep 27, 2023
3db3362
Merge vcf even if joint calling
Sanger-ad7 Sep 28, 2023
6b54d73
removed GATK_SINGLE_SAMPLE_GERMLINE_VARIANT_CALLING from haplotypecaller
Sanger-ad7 Sep 28, 2023
a02418b
Deleted VQSR MERGE and output after ApplyVQSR of indels to snpVQSR input
Sanger-ad7 Sep 28, 2023
0a21435
Reverted Deletion of VQSR MERGE
Sanger-ad7 Sep 28, 2023
7e71c45
made snp applyvqsr the input for indel applyvqsr
Sanger-ad7 Sep 29, 2023
97729cb
made snp applyvqsr the input for indel applyvqsr
Sanger-ad7 Sep 29, 2023
5605ad2
Trying to get applyVQSR to work for indels
Sanger-ad7 Sep 29, 2023
b39dc4d
incorporated changes to avoid VQSR duplications
Sanger-ad7 Oct 26, 2023
42d46f8
Removed VCF filtering
Sanger-ad7 Oct 26, 2023
e3cc2b3
Fixed edit error
Sanger-ad7 Oct 26, 2023
0963b35
Fixed output variable
Sanger-ad7 Oct 26, 2023
bc720e4
Changed config to publish VQSR from APPLYVQSR_INDEl
Sanger-ad7 Oct 26, 2023
af33661
Added variantrecalibrator plot and publish
Sanger-ad7 Oct 26, 2023
d2339ac
Changed publish pattern
Sanger-ad7 Oct 26, 2023
ce60ca2
Added pdf emit to variantrecalibrator
Sanger-ad7 Oct 26, 2023
435c6ea
Updated GATK version from 4.2.6.1 to 4.4.0.0
Sanger-ad7 Feb 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 40 additions & 19 deletions conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -9,41 +9,42 @@
*/

process {
cache = 'lenient'
cpus = { check_max( 1 * task.attempt, 'cpus' ) }
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
//time = { check_max( 12.h * task.attempt, 'time' ) }
shell = ['/bin/bash', '-euo', 'pipefail']

// memory errors which should be retried. otherwise error out
errorStrategy = { task.exitStatus in [143,137,104,134,139,140,247] ? 'retry' : 'finish' }
maxRetries = 1
errorStrategy = { task.attempt <= 3 ? 'retry' : 'ignore' }
maxRetries = 4
maxErrors = '-1'

// Process-specific resource requirements
// See https://www.nextflow.io/docs/latest/config.html#config-process-selectors
withLabel:process_single {
cpus = { check_max( 1 , 'cpus' ) }
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
//time = { check_max( 12.h * task.attempt, 'time' ) }
}
withLabel:process_low {
cpus = { check_max( 2 * task.attempt, 'cpus' ) }
memory = { check_max( 12.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }
//time = { check_max( 12.h * task.attempt, 'time' ) }
}
withLabel:process_medium {
cpus = { check_max( 6 * task.attempt, 'cpus' ) }
memory = { check_max( 36.GB * task.attempt, 'memory' ) }
time = { check_max( 8.h * task.attempt, 'time' ) }
//time = { check_max( 12.h * task.attempt, 'time' ) }
}
withLabel:process_high {
cpus = { check_max( 12 * task.attempt, 'cpus' ) }
memory = { check_max( 72.GB * task.attempt, 'memory' ) }
time = { check_max( 16.h * task.attempt, 'time' ) }
}
withLabel:process_long {
time = { check_max( 20.h * task.attempt, 'time' ) }
//time = { check_max( 48.h * task.attempt, 'time' ) }
}
//withLabel:process_long {
//time = { check_max( 48.h * task.attempt, 'time' ) }
//}
withLabel:process_high_memory {
memory = { check_max( 200.GB * task.attempt, 'memory' ) }
}
Expand Down Expand Up @@ -71,12 +72,18 @@ process {
memory = { check_max( 30.GB * task.attempt, 'memory' ) }
}
withName: 'GATK4_MARKDUPLICATES|GATK4_MARKDUPLICATESSPARK' {
cpus = { check_max( 6 * task.attempt, 'cpus' ) }
memory = { check_max( 30.GB * task.attempt, 'memory' ) }
cpus = { check_max( 1 , 'cpus' ) }
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
}
withName:'GATK4_APPLYBQSR|GATK4_APPLYBQSR_SPARK|GATK4_BASERECALIBRATOR|GATK4_BASERECALIBRATOR_SPARK|GATK4_GATHERBQSRREPORTS'{
cpus = { check_max( 2 * task.attempt, 'cpus' ) }
memory = { check_max( 4.GB * task.attempt, 'memory' ) }
withName: 'GATK4_GENOMICSDBIMPORT|GATK4_GENOTYPEGVCFS' {
cpus = { check_max( 1 , 'cpus' ) }
memory = { check_max( 12.GB * task.attempt, 'memory' ) }
queue = 'basement'
}

withName:'GATK4_HAPLOTYPECALLER|GATK4_APPLYBQSR|GATK4_APPLYBQSR_SPARK|GATK4_BASERECALIBRATOR|GATK4_BASERECALIBRATOR_SPARK|GATK4_GATHERBQSRREPORTS'{
cpus = { check_max( 1 , 'cpus' ) }
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
}
withName:'MOSDEPTH'{
cpus = { check_max( 4 * task.attempt, 'cpus' ) }
Expand All @@ -94,11 +101,25 @@ process {
memory = { check_max( 4.GB * task.attempt, 'memory' ) }
}
withName:'GATK4_MERGEVCFS'{
cpus = { check_max( 2 * task.attempt, 'cpus' ) }
memory = { check_max( 4.GB * task.attempt, 'memory' ) }
cpus = { check_max( 1 , 'cpus' ) }
memory = { check_max( 8.GB * task.attempt, 'memory' ) }
}
withName: 'MULTIQC' {
cpus = { check_max( 4 * task.attempt, 'cpus' ) }
memory = { check_max( 12.GB * task.attempt, 'memory' ) }
}
}

executor {
name = 'lsf'
queueSize = 4000
poolSize = 4
submitRateLimit = '10 sec'
killBatchSize = 50
pollInterval = '10 sec'
queueStatInterval = '20 sec'
dumpInterval = '10 sec'
exitReadTimeout= '10 sec'
perJobMemLimit=true
}

30 changes: 30 additions & 0 deletions conf/lsf.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
process {
cache = 'lenient'
executor = 'lsf'
shell = ['/bin/bash', '-euo', 'pipefail']
}

executor {
name = 'lsf'
queueSize = 4000
poolSize = 4
submitRateLimit = '10 sec'
killBatchSize = 50
pollInterval = '10 sec'
queueStatInterval = '20 sec'
dumpInterval = '10 sec'
exitReadTimeout= '10 sec'
perJobMemLimit=true
}

docker {
enabled = false
}

singularity {
enabled = true
autoMounts = true
cacheDir = '/lustre/scratch118/humgen/resources/containers/'
runOptions = '--dns 172.18.255.1,172.18.255.2,172.18.255.3'
envWhitelist = 'HOSTNAME,SSH_CONNECTION,SSH_CLIENT,CVS_RSH,http_proxy,https_proxy,HTTP_PROXY,HTTPS_PROXY'
}
29 changes: 19 additions & 10 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -727,11 +727,11 @@ process{
}

withName: 'HAPLOTYPECALLER' {
ext.args = { params.joint_germline ? "-ERC GVCF" : "" }
ext.args = { params.joint_germline ? "-ERC GVCF -G StandardAnnotation -G AS_StandardAnnotation -G StandardHCAnnotation" : "" }
ext.prefix = { meta.num_intervals <= 1 ? ( params.joint_germline ? "${meta.id}.haplotypecaller.g" : "${meta.id}.haplotypecaller" ) : ( params.joint_germline ? "${meta.id}.haplotypecaller.${intervals.simpleName}.g" :"${meta.id}.haplotypecaller.${intervals.simpleName}" ) }
ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') }
publishDir = [
enabled: !params.joint_germline,
//enabled: !params.joint_germline,
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/"},
pattern: "*{vcf.gz,vcf.gz.tbi}",
Expand Down Expand Up @@ -762,6 +762,7 @@ process{
}

withName: 'GATK4_GENOTYPEGVCFS' {
ext.args = { "-G StandardAnnotation -G AS_StandardAnnotation" }
ext.prefix = { meta.num_intervals > 1 ? meta.intervals_name : "joint_germline" }
}

Expand All @@ -784,17 +785,19 @@ process{

withName: 'VARIANTRECALIBRATOR_INDEL' {
ext.prefix = { "${meta.id}_INDEL" }
ext.args = "-an QD -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP -mode INDEL"
ext.args = "-AS -an QD -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP -mode INDEL"
publishDir = [
enabled: false
]
}

withName: 'VARIANTRECALIBRATOR_SNP' {
ext.prefix = { "${meta.id}_SNP" }
ext.args = "-an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP"
ext.args = "-AS -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP"
publishDir = [
enabled: false
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/haplotypecaller/joint_variant_calling/"},
pattern: "*{R.pdf,R}"
]
}

Expand All @@ -804,12 +807,8 @@ process{
}

withName: 'GATK4_APPLYVQSR_INDEL'{
ext.prefix = { "${meta.id}_INDEL" }
ext.prefix = { "joint_germline_recalibrated" }
ext.args = '--truth-sensitivity-filter-level 99.9 -mode INDEL'
}

withName: 'MERGE_VQSR' {
ext.prefix = "joint_germline_recalibrated"
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/haplotypecaller/joint_variant_calling/"},
Expand All @@ -818,6 +817,16 @@ process{
]
}

// withName: 'MERGE_VQSR' {
// ext.prefix = "joint_germline_recalibrated"
// publishDir = [
// mode: params.publish_dir_mode,
// path: { "${params.outdir}/variant_calling/haplotypecaller/joint_variant_calling/"},
// saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
// pattern: "*{vcf.gz,vcf.gz.tbi}"
// ]
// }

// MANTA
withName: 'MERGE_MANTA.*' {
publishDir = [
Expand Down
7 changes: 6 additions & 1 deletion conf/test_full_germline.config
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,15 @@ params {
config_profile_description = 'Full test dataset to check germline VC pipeline function'

// Input data for full size test
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/sarek/testdata/csv/NA12878_WGS_30x_full_test.csv'
input = '/lustre/scratch123/hgi/teams/hgi/nextflow_ci_staging/novaseq_eval/novaseq.csv'

// Other params
tools = 'strelka,freebayes,haplotypecaller,deepvariant,manta,tiddit,cnvkit,vep'
fasta = '/lustre/scratch125/humgen/resources/ref/Homo_sapiens/HS38DH/hs38DH.fa'
fasta_fai = '/lustre/scratch125/humgen/resources/ref/Homo_sapiens/HS38DH/hs38DH.fa.fai'
dict = '/lustre/scratch125/humgen/resources/ref/Homo_sapiens/HS38DH/hs38DH.dict'
intervals = '/lustre/scratch125/humgen/resources/ref/Homo_sapiens/HS38DH/wgs_hg38.even.handcurated.20k.FINAL.bed'
nucleotides_per_second = 40000

split_fastq = 50000000
}
3 changes: 2 additions & 1 deletion modules/local/create_intervals_bed/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,12 @@ process CREATE_INTERVALS_BED {
# no runtime estimate in this row, assume default value
t = (\$3 - \$2) / ${params.nucleotides_per_second}
}
if (name == "" || (chunk > 600 && (chunk + t) > longest * 1.05)) {
if (name == "" || (chunk > 600 && (chunk + t) > longest * 1.00) || \$1 != chr ) {
# start a new chunk
name = sprintf("%s_%d-%d.bed", \$1, \$2+1, \$3)
chunk = 0
longest = 0
chr = \$1
}
if (t > longest)
longest = t
Expand Down
2 changes: 1 addition & 1 deletion modules/nf-core/modules/deepvariant/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions modules/nf-core/modules/gatk4/applybqsr/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion modules/nf-core/modules/gatk4/applybqsrspark/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions modules/nf-core/modules/gatk4/applyvqsr/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions modules/nf-core/modules/gatk4/baserecalibrator/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions modules/nf-core/modules/gatk4/calculatecontamination/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion modules/nf-core/modules/gatk4/cnnscorevariants/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading