Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem opening input.bcf -- No debug log #30

Open
George3d6 opened this issue Feb 10, 2020 · 7 comments
Open

Problem opening input.bcf -- No debug log #30

George3d6 opened this issue Feb 10, 2020 · 7 comments

Comments

@George3d6
Copy link

I converted a vcf to bcf and tried running your tool with the following command:
./akt pca -W data/wgs.grch37.vcf.gz input.bcf
The only logs I get are:

Input: input.bcf
Using file data/wgs.grch37.vcf.gz for PCA weights
Problem opening input.bcf

Is see no debug option to make this message more verbose and figure out what the issue is, does a flag for verbose output exist ?

I don't believe the problem is permission related, here's the stats output for input.bcf:

  File: input.bcf
  Size: 557747163       Blocks: 1089360    IO Block: 4096   regular file
Device: fd01h/64769d    Inode: 10640983    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/  george)   Gid: ( 1000/  george)
Access: 2020-02-10 15:23:13.371180294 +0200
Modify: 2020-02-10 15:22:44.071013086 +0200
Change: 2020-02-10 15:22:44.071013086 +0200
 Birth: 
@George3d6
Copy link
Author

Note, I tried the same thing with a .vcf file and I get the exact same issue with the same amount of logs.

@George3d6
Copy link
Author

Trying this command instead: ./akt kin --force -M 1 input.bcf > kinship.txt , I now get this error message:

No frequency VCF provided (-F). Allele frequencies will be estimated from the data.
Problem opening input.bcf
Input file not found.

Which is even weirded, since the input.bcf file is most certainly present. using absolute paths doesn't seem to help.

@jaredo
Copy link
Contributor

jaredo commented Feb 10, 2020

Apologies for the confusing error message. AKT requires indexed files so if you bcftools index input.bcf these problems should go away.

best,

Jared

@George3d6
Copy link
Author

Hmh,

It might be that I converted to bcf poorly, since I got different error after doing that.

However, I tried converting my original file (56001801065146A.snp.vcf) into and appropriate format via:

  1. bgzip 56001801065146A.snp.vcf
  2. bcftools index 56001801065146A.snp.vcf.gz

However upon running: ./akt pca -W data/wgs.grch37.vcf.gz 56001801065146A.snp.vcf.gz I now got the error:

Input: 56001801065146A.snp.vcf.gz
Using file data/wgs.grch37.vcf.gz for PCA weights
1 samples
Using 20 PCs from input file.
0/17491 of sites were in 56001801065146A.snp.vcf.gz
ERROR: less that 90% of sites in data/wgs.grch37.vcf.gz were NOT in data/wgs.grch37.vcf.gz

(Same issue if I use --assume-homref)

Is this to be expected if my vcf file only contains full genome sequence data and not mitochondrial DNA data ?

It does contains 150 or so SNPs that are Y-chromosome haplogroup related, so I assumed this would be correct.

Or might there be something wrong with he way I did my indexing ?

@jaredo
Copy link
Contributor

jaredo commented Feb 10, 2020

wgs.grch37.vcf.gz contains 17,491 common autosomal variants that should be detected in any high coverage whole genome sequenced human (excluding homozygous reference). It won't matter if MT/X/Y variants are you in your VCF, they will just be ignored.

What reference genome are you using? You need this to be consistent with the version in -W vcf, there are loading VCFs included for hg19 and hg39 (both with and without the chr prefix).

@George3d6
Copy link
Author

I am using a VCF I got from datnte's lab ~10 months ago. Is there a standard way to check the "versioning" on those ? I'm not to familiar with the file format to be honest, every time I think I understand how it works something pops up and I realize I don't.

@George3d6
Copy link
Author

I seem to have gotten matches on some (130 sties) with data/wgs.hg38.vcf.gz and on 9960/17491 with data/wgs.hg19.vcf.gz,

Do you have any further documentation that explains the difference between the files and why matches might be found only on some of those ?

Anyway, thanks for all the help, hopefully I can handle the rest from here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants