-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to limit the number of variants used in step 1 #587
Comments
In your post:
Did those 471,762 genotyped SNPs be used in both step1 and step2 or just step1? |
Hi, You could also perform LD pruning to reduce the number of variants used in step 1 in addition to using more stringent QC paramaters. For your question on the analysis in the paper, the 471,762 were used in step 1 and for step 2 we tested on those variants as well as a larger set of imputed variants (using the same step 1 output file). Cheers, |
Thank you so much. Another related question. If we want to do rare variant analysis, when we build the model in the first step, which one will be the best: using common variants filtered by higher MAF (for example 0.1) followed by LD pruning to include more confident SNVs in modelling, or using variants filtered by low MAF (for example 0.001) followed by LD pruning to include more rare SNVs in modeling? |
The goal of step 1 is to capture common genetic variation genome-wide so using a MAF threshold of 1 or 5% is sufficient (then combined with stringent LD pruning to reduce the number of variants). |
That makes sense. Thank you so much for quick response. For "not present in low-complexity regions, not involved in inter-chromosomal LD", how did you achieve this goal? |
Based on introduction, about 500,000 variants should be enough for fitting the model. Usually, we will use a few criteria to keep high quality variants, for example:
However, there might still be too many filtered variants.
My question is, which one would be better for Regenie model fitting and test?
Random sampling qc-filtered variants to required number, such as 500,000. The average MAF of variants would not be very high.
Increase qc criteria, for example, "--mac 800 --geno 0.05" and so on, which might result a lot of very high MAF variants.
The text was updated successfully, but these errors were encountered: