Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document output format with --htp flag for region-based regenie step 2 #524

Open
dvg-p4 opened this issue May 13, 2024 · 5 comments
Open
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@dvg-p4
Copy link

dvg-p4 commented May 13, 2024

The output format for gene/region-based regenie step 2 [edit: when using --htp] does not appear to be documented; https://rgcgithub.github.io/regenie/options/#output_1 just says it is "the same output format mentioned above."

With --build-mask sum, [edit: as mentioned in #336, --htp is intentionally ignored in this case, so] the columns have the same names as a single-variant run:

CHROM GENPOS ID ALLELE0 ALLELE1 A1FREQ N TEST BETA SE CHISQ LOG10P EXTRA

but they are used in ways that don't quite match the header. Empirically, it looks like:

  • CHROM and GENPOS are the coordinates of the first variant in the region
  • ID is <gene name>.<mask name>
  • ALLELE0 is "ref"
  • ALLELE1 is the mask name
  • A1FREQ is presumably calculated after combining all alleles into a pseudo-allele as specified by --build-mask (right?)
  • TEST is "ADD" (at least with all the input parameters I tried)
  • EXTRA is usually NA; presumably it's that "additional column included to specify if Firth/SPA corrections failed."
  • The rest of the columns are exactly what they say on the tin.

With [edit: --htp and] --build-mask max or --build-mask comphet however, the output format is completely different, and now includes the following columns:

Name    Chr     Pos     Ref     Alt     Trait   Cohort  Model   Effect  LCI_Effect      UCI_Effect      Pval    AAF     Num_Cases       Cases_Ref       Cases_Het       Cases_Alt       Num_Controls    Controls_Ref    Controls_Het    Controls_Alt  Info

As far as I can tell:

  • Name is <gene name>.<mask name>
  • Chr and Pos are again the coordinates of the first variant
  • Ref is "ref"
  • Alt is the mask
  • Trait is the phenotype name
  • Cohort is "TEST"
  • Model is something like "ADD-WGR-FIRTH" (I get the gist but I'm not sure what the exact info conveyed here is?)
  • There are now columns for Effect, LCI_Effect (?), and UCI_effect (?) instead of a single beta
  • Pval is self-explanatory
  • AAF I'm again assuming is calculated based on the combined pseudo-alleles
  • Num_Cases, Cases_Ref, etc. are self-explanatory for binary traits, and for continuous traits the counts seem to be shoehorned in by considering everyone a "case" and leaving the "controls" columns NA
  • Info seems to have a series of semicolon-separated name=value pairs, e.g. REGENIE_BETA=0.026852;REGENIE_SE=0.009518;MAC=160787.000000;SCORE=294.673996;SKATV=10948.247814;LOG10P=2.313457

I'd greatly appreciate if you could confirm or correct my assumptions here, fill in the gaps, and put all the information into the official documentation. A few "sample output" files could also be quite helpful, especially for designing workflows that take regenie's output as their input.

@tgbrooks
Copy link

I'm guessing that LCI_Effect and UCI_Effect lower and upper ends confidence interface on the beta (aka Effect). Would be nice to have documentation of these.

@joellembatchou
Copy link
Collaborator

joellembatchou commented Jun 28, 2024

Hi,

The details of the output with --build-mask are specified on the website:
image

What is the full command you are using? (it seems like you may be using the --htp option which is an internal dev command [not documented on the website] & has a different format from the native one)

Cheers,
Joelle

@dvg-p4
Copy link
Author

dvg-p4 commented Jul 15, 2024

@joellembatchou You are correct, we are using the --htp TEST option. This particular regenie pipeline actually predates my tenure at my company, so I'll ask around and see if anyone else remembers why we did that.

@dvg-p4
Copy link
Author

dvg-p4 commented Jul 15, 2024

My best guess here is that we specifically wanted the Cases_Ref / Cases_Het / Controls_Alt etc. statistics, which are not available in the standard, documented output format.

@dvg-p4 dvg-p4 changed the title Document output format for region-based regenie step 2 Document output format with --htp flag for region-based regenie step 2 Jul 15, 2024
@dvg-p4
Copy link
Author

dvg-p4 commented Aug 5, 2024

So I guess my request is for there to be some documented way to get the Cases_Ref / Cases_Het / Controls_Alt etc. statistics out of regenie: either document the currently-available --htp, or add (an option for) these columns in the standard output format.

@joellembatchou joellembatchou added the documentation Improvements or additions to documentation label Oct 8, 2024
@joellembatchou joellembatchou self-assigned this Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants