Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError #9

Open
KasperH2 opened this issue May 20, 2021 · 15 comments
Open

KeyError #9

KasperH2 opened this issue May 20, 2021 · 15 comments

Comments

@KasperH2
Copy link

KasperH2 commented May 20, 2021

Hello
I tried using your program, but i keep getting the error

Traceback (most recent call last):
File "/home/pato/miniconda3/envs/vcf_anno/bin/vcf-annotator", line 393, in
annotator.annotate_vcf_records()
File "/home/pato/miniconda3/envs/vcf_anno/bin/vcf-annotator", line 67, in annotate_vcf_records
self.__gb.accession = record.CHROM
File "/home/pato/miniconda3/envs/vcf_anno/bin/vcf-annotator", line 220, in accession
self.__gb = self.records[value]
KeyError: 'K02718.1'

I am using a viral reference fasta and vcf and in the chromosome column of the vcf name is K02718.1.
I've used the command
vcf-annotator K02718.1.align.vcf K02718.1.gb

I've tried using vcf files generated by freebayes and vcf generated by GATK HaplotypeCaller

I tried changing the chrome column to one number, which gives the keyerror returning that number.
The style of the vcf is as follows:

image

Do you have an idea of the problem?
Thank you very much for your program and help!

@rpetit3
Copy link
Owner

rpetit3 commented Jun 15, 2021

@KasperH2 I apologize for the delay, you caught me during a cross country move.

Please let me know if you want me to look into this further.

@marimaro
Copy link

marimaro commented Aug 6, 2021

Hello,

I'm facing the same problem, did you manage to find a solution?

Thanks.

Marina

@rpetit3
Copy link
Owner

rpetit3 commented Aug 6, 2021

Hello!

Can I get a VCF and GenBank file to figure this out?

Thank you!

@rpetit3
Copy link
Owner

rpetit3 commented Aug 13, 2021

Hi @marimaro or @KasperH2

Just following up to see if you had a VCF and GenBank file you could share.

Thank you!

@marimaro
Copy link

Hi Robert,

Sorry for the delay! In the end I managed to run vcf-annotator.

However, when I tried running with a genbank file directly downloaded from NCBI (this one), it didn't work, but when I made a gb file from a fasta and gff using seqret, it worked.

Also, just for the record, my VCF files had some '*' characters that I had to remove to successfully run vcf-annotator.

Thanks for your time,

@rpetit3
Copy link
Owner

rpetit3 commented Aug 13, 2021

Thank you for bringing up * I'll work on getting that replaced.

I'll also play around with the GenBank file.

@rpetit3
Copy link
Owner

rpetit3 commented Aug 13, 2021

@marimaro last request, do you by chance have a VCF that I could test. If not it's ok, I'll make a fake one that should match your issues

@rpetit3
Copy link
Owner

rpetit3 commented Aug 13, 2021

Alright so I think the issue is, vcf-annotator is expecting the CHROM field in the VCF to match the ACCESSION in in the GenBank file. Except in your VCFs the CHROM matches the VERSION.

ACCESSION   K02718
VERSION     K02718.1

So what I'm thinking is I'll add a check something like:

try:
    accession = CHROM[ACCESSION]
except KEYERROR:
    accession = CHROM[VERSION]

But it would be really useful to have a good example VCF.

@rpetit3
Copy link
Owner

rpetit3 commented Aug 13, 2021

Ok, I think I fixed the KeyError issue.

Example VCF

##fileformat=VCFv4.1
##contig=<ID=1,length=29903>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
NC_045512.2     25      .       T       G       .       .       .
NC_045512.2     241     .       C       T       .       .       .
NC_045512.2     512     .       C       T       .       .       .
NC_045512.2     514     .       T       C       .       .       .
NC_045512.2     520     .       G       T       .       .       .

Example Genbank

LOCUS       NC_045512              29903 bp ss-RNA     linear   VRL 18-JUL-2020
DEFINITION  Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1,
            complete genome.
ACCESSION   NC_045512
VERSION     NC_045512.2
DBLINK      BioProject: PRJNA485481
KEYWORDS    RefSeq.

Ouput annotated VCF

##fileformat=VCFv4.1
##INFO=<ID=RefCodon,Number=.,Type=String,Description="Reference codon">
##INFO=<ID=AltCodon,Number=.,Type=String,Description="Alternate codon">
##INFO=<ID=RefAminoAcid,Number=.,Type=String,Description="Reference amino acid">
##INFO=<ID=AltAminoAcid,Number=.,Type=String,Description="Alternate amino acid">
##INFO=<ID=CodonPosition,Number=1,Type=Integer,Description="Codon position in the gene">
##INFO=<ID=SNPCodonPosition,Number=1,Type=Integer,Description="SNP position in the codon">
##INFO=<ID=AminoAcidChange,Number=.,Type=String,Description="Amino acid change">
##INFO=<ID=IsSynonymous,Number=1,Type=Integer,Description="0:nonsynonymous, 1:synonymous, 9:N/A or Unknown">
##INFO=<ID=IsTransition,Number=1,Type=Integer,Description="0:transversion, 1:transition, 9:N/A or Unknown">
##INFO=<ID=IsGenic,Number=1,Type=Integer,Description="0:intergenic, 1:genic">
##INFO=<ID=IsPseudo,Number=1,Type=Integer,Description="0:not pseudo, 1:pseudo gene">
##INFO=<ID=LocusTag,Number=.,Type=String,Description="Locus tag associated with gene">
##INFO=<ID=Gene,Number=.,Type=String,Description="Name of gene">
##INFO=<ID=Note,Number=.,Type=String,Description="Note associated with gene">
##INFO=<ID=Inference,Number=.,Type=String,Description="Inference of feature.">
##INFO=<ID=Product,Number=.,Type=String,Description="Description of gene">
##INFO=<ID=ProteinID,Number=.,Type=String,Description="Protein ID of gene">
##INFO=<ID=Comments,Number=.,Type=String,Description="Example: Negative strand: T->C">
##INFO=<ID=VariantType,Number=.,Type=String,Description="Indel, SNP, Ambiguous_SNP">
##INFO=<ID=FeatureType,Number=.,Type=String,Description="The feature type of variant.">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=1,length=29903>
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
NC_045512.2     25      .       T       G       .       .       RefCodon=.;AltCodon=.;RefAminoAcid=.;AltAminoAcid=.;CodonPosition=.;SNPCodonPosition=.;AminoAcidChange=.;IsSynonymous=9;IsTransition=0;IsGenic=0;IsPseudo=0;LocusTag=.;Gene=.;Note=.;Inference=.;Product=.;ProteinID=.;Comments=.;VariantType=SNP;FeatureType=inter_genic
NC_045512.2     241     .       C       T       .       .       RefCodon=.;AltCodon=.;RefAminoAcid=.;AltAminoAcid=.;CodonPosition=.;SNPCodonPosition=.;AminoAcidChange=.;IsSynonymous=9;IsTransition=1;IsGenic=0;IsPseudo=0;LocusTag=.;Gene=.;Note=.;Inference=.;Product=.;ProteinID=.;Comments=.;VariantType=SNP;FeatureType=inter_genic
NC_045512.2     512     .       C       T       .       .       RefCodon=CAT;AltCodon=TAT;RefAminoAcid=H;AltAminoAcid=Y;CodonPosition=83;SNPCodonPosition=0;AminoAcidChange=H83Y;IsSynonymous=0;IsTransition=1;IsGenic=1;IsPseudo=0;LocusTag=GU280_gp01;Gene=ORF1ab;Note=pp1a;Inference=.;Product=ORF1a[space]polyprotein;ProteinID=YP_009725295.1;Comments=.;VariantType=SNP;FeatureType=CDS
NC_045512.2     514     .       T       C       .       .       RefCodon=CAT;AltCodon=CAC;RefAminoAcid=H;AltAminoAcid=H;CodonPosition=83;SNPCodonPosition=2;AminoAcidChange=H83H;IsSynonymous=1;IsTransition=1;IsGenic=1;IsPseudo=0;LocusTag=GU280_gp01;Gene=ORF1ab;Note=pp1a;Inference=.;Product=ORF1a[space]polyprotein;ProteinID=YP_009725295.1;Comments=.;VariantType=SNP;FeatureType=CDS
NC_045512.2     520     .       G       T       .       .       RefCodon=ATG;AltCodon=ATT;RefAminoAcid=M;AltAminoAcid=I;CodonPosition=85;SNPCodonPosition=2;AminoAcidChange=M85I;IsSynonymous=0;IsTransition=0;IsGenic=1;IsPseudo=0;LocusTag=GU280_gp01;Gene=ORF1ab;Note=pp1a;Inference=.;Product=ORF1a[space]polyprotein;ProteinID=YP_009725295.1;Comments=.;VariantType=SNP;FeatureType=CDS

@rpetit3
Copy link
Owner

rpetit3 commented Aug 13, 2021

I will need an example for VCF's with '*' in them. Unless this is a good example: #6 (comment)

@marimaro
Copy link

Awesome!
Yes, that is a good example of what I had in my VCF files.

@BioWilko
Copy link

Here's another example if you're still looking into this @rpetit3
calls.vcf.txt

@rpetit3
Copy link
Owner

rpetit3 commented Sep 13, 2021

Awesome thank you @BioWilko

@rpetit3
Copy link
Owner

rpetit3 commented Sep 13, 2021

I just pushed v0.7 with a fix for this issue: https://github.com/rpetit3/vcf-annotator/releases/tag/v0.7

Please let me know if that's not the case!

@rpetit3
Copy link
Owner

rpetit3 commented Sep 13, 2021

The Key Error issue specifically, not the * issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants