This section describes a pipeline in devlopment, purpose of this pipeline is to do a meta analysis with a various format files.Our script, meta-assoc.nf takes as input various GWAS results files and rsid
to do a metanalysis with METAL, GWAMA and Metasoft
need python3, METAL (last version : https://github.com/statgen/METAL), GWAMA, MR-MEGA and MetaSoft (one version is available on utils/bin/)
The key options are:
work_dir
: the directory in which you will run the workflow. This will typically be the h3agwas directory which you cloned;input
,output
and script directories: the default is that these are subdirectories of thework_dir
and there'll seldom be reason to change these;output_dir
= "all"- meta analysis option :
metal
: 1 perform metal (default 0)gwama
: 1 perform gwama (default 0)plink
: 1 perform perform meta analyse in plink(default 0)metasoft
: 1 perform metasoft(default 0)metasoft_pvalue_table
: for metasoft need files : HanEskinPvalueTable.txt
mrmega
: 1 perform MR-MEGA (default 0)
file_config
- describe all informations for each gwas result used for meta analysis
- file is comma separated (csv), each line is to describe one file
- header of config file is : rsID,Chro,Pos,A1,A2,Beta,Se,Pval,N,freqA1,direction,Imputed,Sep,File,IsRefFile
rsID
: column name for rsID in gwas fileChro
: column name for Chro in gwas filePos
: column name for Pos in gwas fileA1
: column name for reference allele in gwas fileA2
: column name for alternative allele in gwas fileBeta
: column name for B values in gwas fileSe
: column name for sterr values in gwas fileN
: column name for size in gwas filefreqA1
: column name for freqA1 or maf in gwas filedirection
: column name of strand for association -/+ in gwas fileImputed
: column name of imputed or not for position in gwas fileNCount
: column name to add a column N at your file with value in the columnSep
: what separator is in gwas file :- you could use characters as ; . : but to avoid some trouble you can use :
- COM : for comma
- TAB : for tabulation
- WHI : for white space
- you could use characters as ; . : but to avoid some trouble you can use :
File
: gwas file with full path- if one of the column is missing in your GWAS file, replace by NA
- optional option :
- memorie usage :
- plink_mem_req : [20GB]
- gwama_mem_req : gwama memories [20GB]
- metasoft_mem_req : metasoft memories ["20G"]
- ma_mem_req : request for extraction of data, change format and plot of manhathan ["10G"]
- mrmega_mem_req : mr mega memorie ["20GB"]
used_pval_z
: computed beta and se usinf pvalue and sign of initial beta, n value and frequency [defaul 0:no]- formula to computed beta and se
- memorie usage :
z=abs(stats.norm.ppf(1-pinit/2))*sens
b=z/sqrt(2*freq*(1-freq)*(n+z**2))
se=1/sqrt(2*freq*(1-freq)*(n+z**2))
* cpu and memories :
* max_plink_cores : [default 4]
* other used 1 cpus
* binaries :
* `metal_bin` : binarie for metal (default : _metal_ )
* `gwama_bin` : binarie for gwama ( default : _GWAMA__ )
* `metasoft_bin` : binarie for java of metasoft ( default _Metasoft.jar_)
* `mrmega_bin` : binarie for java of metasoft ( default _Metasoft.jar_)
* `plink_bin` : binarie for java of metasoft ( default _Metasoft.jar_)
* options software :
* `ma_metasoft_opt` : append other option in metasoft command line(default : null)
* `ma_genomic_cont` : use a genomic_control use in METAL and GWAMA(default, 0)
* `ma_weigthedz`: used p-value and N params for plink and metal (default: 0), if 1 used pvalue and N otherwise used Beta and se
* `ma_overlap_sample`: do you have sample overlap? used by metal(default: 0)
* `metal_het`: computed heterogenity option used by metal (default: 0)
* `ma_random_effect` : do mixed model (default 1)
* `ma_mrmega_pc` : how many pcs used for mrmega (default : 4)
* `ma_mrmega_opt` : append other option in MR-MEGA command line (default : null)
* `us_rs` : if you want chromosome and position are replaced using rs (warning you need to be sure that one chromosome position has same rs in each file), [default 0, yes : 1], otherwise they will used chromosome and position to replaced rs
MR-MEGA need chromosomes, positions and N (sample number) for each position, so in pipeline referent file (in file_config, 1 in IsRefFile) must be have chromosome and poosition
| | options |
Software | ma_genomic_cont |
ma_weigthedz |
ma_overlap_sample |
ma_random_effect |
---|---|---|---|---|
descriptionus | genomic control | used p-value and N (1) or beta and se (0) | sample overlap | Random Effect |
default | 0 | 0 | 0 | 0 |
metal | yes | yes | yes | no |
gwama | yes | default no | yes | |
Mr Mega | yes | no | no | no |
plink | no | yes | no | no |
Metasoft | no | no | no | default |
--- | --- | --- | --- | --- |
1 ma_weigthedz
requests weighted Z-score-based p-values (as computed by the Abecasis Lab's METAL software)
Software | Manuals | References |
---|---|---|
Metal | here | here |
Gwama | here | here |
Mr Mega | here | here |
plink | here | here |
Metasoft | here | here |
--- | --- | --- |
- data and command line can be found h3agwas-examples
- a csv file need to described each input, contains header for each file
echo "rsID,Chro,Pos,A1,A2,Beta,Se,Pval,N,freqA1,direction,Imputed,Sep,File,Ncount" > utils/input_meta.csv
for File in `ls data/summarystat/*.gemma|grep -v ".all"`
do
echo "rs,chr,ps,allele0,allele1,beta,se,p_wald,NA,af,NA,NA,TAB,$File,2500" >> utils/input_meta.csv
done
- input :
- user can choose software that he want to run : metal (
--metal 1
), gwama (--gwama 1
), metasoft (--metasoft 1
) MrMega (--mrmega 1
) and plink (--plink 1
)
- user can choose software that he want to run : metal (
nextflow run ~/Travail/git/h3agwas/meta/meta-assoc.nf --metal 1 --gwama 1 --metasoft 1 --mrmega 1 --plink 1 --file_config utils/input_meta.csv -resume -profile slurmSingularity --output_dir meta
file_gwas
: one ore more one file gwas of differents phenotype- ̀head_pval` : pvalue header [ default : "P_BOLT_LMM" ]
head_n
: N (individuals number) [ default : None ], if not present computed with plink (and data/pheno if present)head_rs
: rs header column [default : "SNP"]head_beta
: beta header colum [default : "BETA"]head_se
: column for standard error of beta "SE"head_A1
: column for A0 :[default : "ALLELE0" ]head_A2
: column for A0 :[default : "ALLELE2" ]head_freq
: freq header [ default : A1Freq],head_n
: N header, used just for ldsc, if not present,Nind
must be initialize.
- if n not initialise :
- used plink file to computed each position with n :
input_pat
: input pattern of plink fileinput_dir
: input dir of plink file- list_n : need to be implemented
- data and command line can be found h3agwas-examples
- multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples.
- input :
- list of summary statistic
file_gwas
and header from gwas file:-head_[name]
- also you can give nformation relative to :
--input_dir data/imputed/ --input_pat imput_data --pheno pheno_qt1,pheno_qt2 --data data/pheno/pheno_test.all
, can add N value to each summary statistic
nextflow h3abionet/h3agwas/meta/mtag-assoc.nf --head_freq af --head_pval p_wald --head_bp ps --head_chr chr --head_rs rs --head_beta beta --head_se se --head_A1 allele1 --head_A2 allele0 --input_dir data/imputed/ --input_pat imput_data --file_gwas data/summarystat/all_pheno.gemma,data/summarystat/all_phenoq2.gemma --pheno pheno_qt1,pheno_qt2 --data data/pheno/pheno_test.all -resume -profile slurmSingularity