seqVCF2GDS Error Converting VCF to GDS file
Last seen 3.2 years ago

Dear Bioconductor:

I am a student of SISG Module 17 and used the code to convert my VCF file to GDS file. vcffile <- "data/72S1.vcf.gz" gdsfile <- "data/72S1.gds" seqVCF2GDS(vcffile, gdsfile, fmt.import="GT", storage.option="LZMA_RA", verbose=FALSE)

The VCF file is generated from WES of human, by using the Enrichment App. by Illumina. The VCF file contains a single patient.

I received the following error message.

Error in seqVCF2GDS(vcffile, gdsfile, fmt.import = "GT", storage.option = "LZMARA", : INFO ID 'GMAF' (Number=A) should have 0 value(s), but receives 1. FILE: C:\Users\winst\Documents\data\72S1.vcf.gz LINE: 160, COLUMN: 8, RefMinor;GMAF=C|0.04812;phyloP=-1.165;CSQT=1|DDX11L1|ENST00000456328|downstreamgenevariant,1|WASH7P|ENST00000438504|intronvariant&noncodingtranscriptvariant

Please help.

Winston Dunn

software error SeqArray seqVCF2GDS • 512 views
Last seen 12 days ago
University of Washington

seqVCF2GDS is particular about VCF files conforming to the VCF standard. In this case it looks like the header line for "GMAF" has "Number=A", which means there should be one value per alternate allele. The file itself appears to have a row where there is no alternate allele (hence seqVCF2GDS is expecting 0 values), but there is a value provided for "GMAF". You might be able to solve this just by modifying the header, which you can do in the VCF file itself, or by saving a separate file with just the header and modifying that instead. You could then specify that alternate header in seqVCF2GDS:

hdr <- seqVCF_Header("revised_header.vcf")
gdsfile <- seqVCF2GDS(vcffile, gdsfile, header=hdr)
Thank you Stephanie! The Illumina Basespace provides 2 apps for making the VCF files: the "Enrichment" and "BWA Enrichment" cost exactly the same. When I generated the VCF files with BWA Enrichment it did not cause the problem.

zhengx ▴ 30
Last seen 3.1 years ago
United States

You can directly modify the header in R:

hdr <- seqVCF_Header("data/72S1.vcf.gz")
hdr$info$Number[hdr$info$ID == "GMAF"] <- "."

gdsfile <- seqVCF2GDS(vcffile, gdsfile, header=hdr)

