Document output format with --htp
flag for region-based regenie step 2
#524
Labels
documentation
Improvements or additions to documentation
The output format for gene/region-based regenie step 2 [edit: when using
--htp
] does not appear to be documented; https://rgcgithub.github.io/regenie/options/#output_1 just says it is "the same output format mentioned above."With
--build-mask sum
, [edit: as mentioned in #336,--htp
is intentionally ignored in this case, so] the columns have the same names as a single-variant run:but they are used in ways that don't quite match the header. Empirically, it looks like:
CHROM
andGENPOS
are the coordinates of the first variant in the regionID
is<gene name>.<mask name>
ALLELE0
is "ref"ALLELE1
is the mask nameA1FREQ
is presumably calculated after combining all alleles into a pseudo-allele as specified by--build-mask
(right?)TEST
is "ADD" (at least with all the input parameters I tried)EXTRA
is usually NA; presumably it's that "additional column included to specify if Firth/SPA corrections failed."With [edit:
--htp
and]--build-mask max
or--build-mask comphet
however, the output format is completely different, and now includes the following columns:As far as I can tell:
Name
is<gene name>.<mask name>
Chr
andPos
are again the coordinates of the first variantRef
is "ref"Alt
is the maskTrait
is the phenotype nameCohort
is "TEST"Model
is something like "ADD-WGR-FIRTH" (I get the gist but I'm not sure what the exact info conveyed here is?)Effect
,LCI_Effect
(?), andUCI_effect
(?) instead of a single betaPval
is self-explanatoryAAF
I'm again assuming is calculated based on the combined pseudo-allelesNum_Cases
,Cases_Ref
, etc. are self-explanatory for binary traits, and for continuous traits the counts seem to be shoehorned in by considering everyone a "case" and leaving the "controls" columns NAInfo
seems to have a series of semicolon-separatedname=value
pairs, e.g.REGENIE_BETA=0.026852;REGENIE_SE=0.009518;MAC=160787.000000;SCORE=294.673996;SKATV=10948.247814;LOG10P=2.313457
I'd greatly appreciate if you could confirm or correct my assumptions here, fill in the gaps, and put all the information into the official documentation. A few "sample output" files could also be quite helpful, especially for designing workflows that take regenie's output as their input.
The text was updated successfully, but these errors were encountered: