Question: readVCF with 1000 genomes data from 20130502 fails to return genotype information
0
gravatar for Simon Coetzee
4.4 years ago by
USA Cedars-Sinai Medical Center
Simon Coetzee50 wrote:

When running readVCF with 1000 genomes data from 20130502 no genotype is returned.

I believe this to be because readVCF looks for a description of the FORMAT field in the header of the vcf file, but in the latest version of 1000 genomes, this is not present. It seems based upon my reading of the specification listed here

http://samtools.github.io/hts-specs/VCFv4.1.pdf

that the FORMAT field does not have to be specified in the header if it may change from variant to variant.

The vcf files however do posses the entry GT under the field FORMAT for each variant that has a genotype call. They are actually simplified, because as far as I can tell they only specify GT and no other field.

here is an example that ought to be reproducible:

region:

loc <- GRanges(1, IRanges(232164611, 233164611))
params <- ScanVcfParam(which=loc)

old 1000 genomes data:

file <- "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/ALL.chr1.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz"
vcf <- readVcf(TabixFile(file), "hg19", params)
geno(vcf)

new 1000 genomes data:

file <- "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz"
vcf <- readVcf(TabixFile(file), "hg19", params)
geno(vcf)

One may confirm that the new 1000 genomes data does in fact contain GT information with tabix via:

tabix -h ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz 1:232164611-233164611 > temp.vcf

 

vcf readvcf tabix • 730 views
ADD COMMENTlink modified 4.4 years ago by Valerie Obenchain6.7k • written 4.4 years ago by Simon Coetzee50
Answer: readVCF with 1000 genomes data from 20130502 fails to return genotype informatio
2
gravatar for Valerie Obenchain
4.4 years ago by
United States
Valerie Obenchain6.7k wrote:

Hi Simon,

Yes, this is a known. See a similar discussion ReadVCF not reading samples. It's on the TODO to parse the 'standard' fields listed in 1.4.2 of the VCF spec even when they don't have a header line.

The interim solution is to simply add a header line for GT to your file. Variants without GT are read in as missing ('.|.' etc.).

Valerie

 

Update 8/13/15:

See updated answer C: ReadVCF not reading samples.

ADD COMMENTlink modified 4.3 years ago • written 4.4 years ago by Valerie Obenchain6.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 376 users visited in the last hour