how to limit the number of variants used in step 1 #587

shengqh · 2024-12-24T22:25:37Z

Based on introduction, about 500,000 variants should be enough for fitting the model. Usually, we will use a few criteria to keep high quality variants, for example:

--mac 200 --geno 0.1 --maf 0.1

However, there might still be too many filtered variants.

My question is, which one would be better for Regenie model fitting and test?

Random sampling qc-filtered variants to required number, such as 500,000. The average MAF of variants would not be very high.
Increase qc criteria, for example, "--mac 800 --geno 0.05" and so on, which might result a lot of very high MAF variants.

The text was updated successfully, but these errors were encountered:

Ojami · 2024-12-27T11:03:35Z

Please see #497 and #530.

Hope this helps.

shengqh · 2024-12-28T04:31:22Z

In your post:

From REGENIE paper:

a minor allele frequency of ≥1%, a Hardy–Weinberg equilibrium test not exceeding P = 1 × 10−15, a genotyping rate above 99%, not present in low-complexity regions, not involved in inter-chromosomal LD and LD pruning using a R2 threshold of 0.9 with a window size of 1,000 markers and a step size of 100 markers. This resulted in up to 471,762 genotyped SNPs that were kept in the analyses

Did those 471,762 genotyped SNPs be used in both step1 and step2 or just step1?

joellembatchou · 2025-01-03T21:35:48Z

Hi,

You could also perform LD pruning to reduce the number of variants used in step 1 in addition to using more stringent QC paramaters.

For your question on the analysis in the paper, the 471,762 were used in step 1 and for step 2 we tested on those variants as well as a larger set of imputed variants (using the same step 1 output file).

Cheers,
Joelle

shengqh · 2025-01-07T21:55:29Z

Thank you so much. Another related question. If we want to do rare variant analysis, when we build the model in the first step, which one will be the best: using common variants filtered by higher MAF (for example 0.1) followed by LD pruning to include more confident SNVs in modelling, or using variants filtered by low MAF (for example 0.001) followed by LD pruning to include more rare SNVs in modeling?

joellembatchou · 2025-01-07T22:45:37Z

The goal of step 1 is to capture common genetic variation genome-wide so using a MAF threshold of 1 or 5% is sufficient (then combined with stringent LD pruning to reduce the number of variants).

shengqh · 2025-01-08T05:54:50Z

That makes sense. Thank you so much for quick response.

For "not present in low-complexity regions, not involved in inter-chromosomal LD", how did you achieve this goal?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to limit the number of variants used in step 1 #587

how to limit the number of variants used in step 1 #587

shengqh commented Dec 24, 2024

Ojami commented Dec 27, 2024

shengqh commented Dec 28, 2024 •

edited

Loading

joellembatchou commented Jan 3, 2025

shengqh commented Jan 7, 2025

joellembatchou commented Jan 7, 2025

shengqh commented Jan 8, 2025 •

edited

Loading

how to limit the number of variants used in step 1 #587

how to limit the number of variants used in step 1 #587

Comments

shengqh commented Dec 24, 2024

Ojami commented Dec 27, 2024

shengqh commented Dec 28, 2024 • edited Loading

joellembatchou commented Jan 3, 2025

shengqh commented Jan 7, 2025

joellembatchou commented Jan 7, 2025

shengqh commented Jan 8, 2025 • edited Loading

shengqh commented Dec 28, 2024 •

edited

Loading

shengqh commented Jan 8, 2025 •

edited

Loading