SNP calling/filtering: how to modify the ratio of reads to call a SNP and percentage of isolates containing a SNP #12

butterbee · 2025-01-13T08:42:12Z

How to set up the parameters to call a SNP that pass a particular ratio of reads and only present in certain % of isolates in a given dataset? Not sure, what are a the VSNP3 default values for these parameters. if there are any, how to change it while running step1 or step2?

stuber · 2025-01-13T11:25:35Z

When running step 2, there are three threshold parameters you may be interested in changing:

-w QUAL_THRESHOLD, --qual_threshold QUAL_THRESHOLD
Optional: Minimum QUAL threshold for calling a SNP
-x N_THRESHOLD, --n_threshold N_THRESHOLD
Optional: Minimum N threshold. SNPs between this and qual_threshold are reported as N
-y MQ_THRESHOLD, --mq_threshold MQ_THRESHOLD
Optional: At least one position per group must have this minimum MQ threshold to be called.

Default values:
-w [150] --> SNP: QUAL >150
-x [50] --> N: QUAL 50-150
-y [56] --> MQ: >56

butterbee · 2025-01-13T11:54:55Z

Thanks for the clarification. This is useful in setting up the quality threshold in step2.
Re SNP filtering, I'm struggling to understand how can we adjust the parameters to include SNPs that are only present in 90% (example) of isolates. Can we make such modifications in step1?
Also, can you clarify that the final SNP alignment produced by step2 is the core SNP alignment (snps that are present in all the isolates)?
Thanks!

stuber · 2025-01-13T12:24:51Z

There is no way to select a percentage of isolates. This is not within vSNP's scope. However, after running step 2 and examining the output SNP table, if you see a group of SNPs in the table that are being called for a subgroup of samples, a position in that group of SNPs can be selected to be used as a defining SNP. Defining SNPs are found in the *define_filter.xlsx dependency file. You can locate this file by showing your reference type locations with the command: vsnp3_path_adder.py -s

There is additional information on adding a defining SNP here:
https://github.com/USDA-VS/vSNP/blob/master/docs/detailed_usage.md#adding-new-groups-or-subgroups

The final SNP alignment is not a core SNP alignment. The SNP alignments output for the designated groups only include those SNPs that are parsimony informative. When looking at a group, if the same SNP has occurred in all samples within that group, it will not be shown in the SNP table. vSNP was designed to show differences between datasets. As datasets increase and new outbreaks emerge, new defining SNPs are used to group samples into relatively small subsets so the focus can be on SNP changes specific to an outbreak.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNP calling/filtering: how to modify the ratio of reads to call a SNP and percentage of isolates containing a SNP #12

SNP calling/filtering: how to modify the ratio of reads to call a SNP and percentage of isolates containing a SNP #12

butterbee commented Jan 13, 2025

stuber commented Jan 13, 2025

butterbee commented Jan 13, 2025

stuber commented Jan 13, 2025

SNP calling/filtering: how to modify the ratio of reads to call a SNP and percentage of isolates containing a SNP #12

SNP calling/filtering: how to modify the ratio of reads to call a SNP and percentage of isolates containing a SNP #12

Comments

butterbee commented Jan 13, 2025

stuber commented Jan 13, 2025

butterbee commented Jan 13, 2025

stuber commented Jan 13, 2025