This tool is designed for quality control of Illumina microarrays genotyping data.
To run this tool on your machine, you need to have Python 2.7 or 3 installed. Besides that, you need several packages, which can be installed using pip. -scikit-learn -plotly
- Most common scenario: python3 cuesea.py Genome_Studio_Report.txt --snps=snps.txt This command will run QC using your Genome_Studio_Report.txt as data file and snps.txt as list of snps to analyze. X and Y chromosome snps treated as autosomal.
- Gender python3 cuesea.py Genome_Studio_Report.txt --snps=snps.txt --gender gender.txt This command will run QC using your Genome_Studio_Report.txt as data file and snps.txt as list of snps to analyze. This will include proper analyze of X and Y chromosome snps as described in HOW IT WORKS paragraph.
- Angle python3 cuesea.py Genome_Studio_Report.txt --angle 8.0 The --angle key specifies max angle between points of each cluster. The default value is 8.0 degrees, but you can slightly increase or dicrease it.
- Genome Studio Report This tool uses default Genome Studio report, and we use columns:
- SNP Name (which is Illumina Name)
- Sample ID
- Allele 1 - Top
- Allele 2 - Top
- Chr
- X
- Y
- Snps list List of snps, one per line.
- Gender File with two tab-separated columns (Sample ID, gender) no header. F - for women, M - for men. Other symbols treated as undefined gender. For example: 0000-1111 F
Tool outputs 3 types of reports:
- Sample Report - list of snps, which have not pass the QC. This file contains columns: SNP - Sample ID - error code. Error code 0 - snp fails intensity check Error code 1 - snp failed clusterization step Error code 2 - snp failed gender check Error code 3 - snp’s genotype distribution doesn’t fit to HWE
- Overall report - list of snps with statistics. This file contains columns: SNP - Number of samples - Excluded by QC - Excluded by R(intensity) - Excluded by clusterization Or if it was gender file on input: SNP - № of samples - Excluded by QC - Excluded by R(intensity) - Excluded by clusterization - Excluded from X\Y
- Genotype Report - information about SNPs with changed genotype class This file contains columns: SNP - Sample ID - old GT class - new GT class
- Plots Samples marked green - passes QC Samples marked blue - not passed QC Samples marked red - changed GT class
Daria Iakovishina, PhD
Nikita Moshkov
GNU GPL v.3