Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to prevent ApplyBQSR from passing intervals to prevent it from dropping reads. #1772

Open
Mani-varma1 opened this issue Jan 15, 2025 · 2 comments

Comments

@Mani-varma1
Copy link

Hi

I have a small problem during base recalibration step as its dropping some of the reads that fall outside the interval regions that I am providing. Currently our panels involves sequencing about 150 genes. When we are doing our own internal QC we look at GC content not exceeding a certain threshold. However, some of the regions in our intervals regions exceed this amount, so we are are sequencing bases in the adjacent region as a "proxy" to check for any sequencing issues. Although we use this region only as a qc check, we do not want to call any variants in this region.

However the way currently BaseRecal is setup, both training and applying the scores is using this interval file which causes the reads that fall outside this region to be dropped. However, ideally I do not want to remove any reads in any part of the pipeline. Is there an easier way of removing an argument using the config file. Currently I have manually modified the code to remove the interval argument which works as expected.

so from

gatk --java-options "-Xmx${avail_mem}M -XX:-UsePerfData" \\
        ApplyBQSR \\
        --input $input \\
        --output ${prefix}.${input.getExtension()} \\
        --reference $fasta \\
        --bqsr-recal-file $bqsr_table \\
        $interval_command \\
        --tmp-dir . \\
        $args

to

    gatk --java-options "-Xmx${avail_mem}M -XX:-UsePerfData" \\
        ApplyBQSR \\
        --input $input \\
        --output ${prefix}.${input.getExtension()} \\
        --reference $fasta \\
        --bqsr-recal-file $bqsr_table \\
        --tmp-dir . \\
        $args

I can also change the interval to include these regions and add extra padding, but ideally don't want any reads ever dropped in case if we have to investigate it.

Note this step is also not recommended by the broad institute team
please see:

@FriederikeHanssen
Copy link
Contributor

Hey! You can run sarek with --no_intervals to skip this entirely. For doing variant calling again with intervals you could restart with --step variant_calling

@Mani-varma1
Copy link
Author

Thanks for getting back to me. It is still recommended to use the intervals to train the base recalibration model. So I would like to train the models based on the intervals then apply the scores to all the reads. I am guessing I have to run it with intervals up to base recalibrator step (not sure if it dose both training and applybqsr?) and restart it at baser recalibrator (not sure how I would strictly start at applybqsr step in this instance) and restart again with variant calling.

In terms of restarting from the previous run and how it works, do we treat them as separate commands/analysis, so running them in isolation or is there a way to let sarek know about the previous steps undertaken?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants