VariantAnnotation_1.8.13 was writing records correctly, but VariantAnnotation_1.12.3 no longer writes ungenotyped data to disk. The VCF headers are successfully written but variants are not. The source code indicates the problem is that VariantAnnotation:::.makeVcfGeno does not write the records to disk as they are flushed by the C .make_vcf_geno function which is only called if genotypes exist.
As a workaround, I am using writeLines(c(VariantAnnotation:::.makeVcfHeader(vcf), VariantAnnotation:::.makeVcfMatrix(NULL, vcf)), file) instead of writeVcf(vcf, file) but a proper fix would be very much appreciated.
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] VariantAnnotation_1.12.3 Rsamtools_1.18.1 Biostrings_2.34.0
[4] XVector_0.6.0 GenomicRanges_1.18.1 GenomeInfoDb_1.2.2
[7] IRanges_2.0.0 S4Vectors_0.4.0 BiocGenerics_0.12.0
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.28.1 BBmisc_1.8 BSgenome_1.34.0
[4] BatchJobs_1.5 Biobase_2.26.0 BiocParallel_1.0.0
[7] DBI_0.3.1 GenomicAlignments_1.2.1 GenomicFeatures_1.18.2
[10] RCurl_1.95-4.3 RSQLite_1.0.0 XML_3.98-1.1
[13] base64enc_0.1-2 biomaRt_2.22.0 bitops_1.0-6
[16] brew_1.0-6 checkmate_1.5.0 codetools_0.2-9
[19] digest_0.6.4 fail_1.2 foreach_1.4.2
[22] iterators_1.0.7 rtracklayer_1.26.1 sendmailR_1.2-1
[25] stringr_0.6.2 tools_3.1.2 zlibbioc_1.12.0
No GT field at all. The following script reproduces the problem:
library(VariantAnnotation)
f <- file("test.vcf")
writeLines(c(
"#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO",
"chr1\t1\tvariant1\tA\t[chr1:1[a\t1\t.\tATTR=value",
"chr2\t2\tvariant2\tA\t[chr2:2[a\t2\t.\tATTR=value"
), f)
close(f)
vcf <- readVcf("test.vcf", "hg19")
writeVcf(vcf, "test-unchanged-expect-2-records.vcf")
writeVcf(vcf[fixed(vcf)$QUAL==1,], "test-subset-expect-1-record.vcf")
Thanks for the example. Now fixed in release (1.12.4) and devel (1.13.11).
The bug was a hold over from when data were written out from R. As you've seen, we've since moved the writing down to C. Thanks for reporting the bug.
Valerie
I am using "VariantAnnotation_1.12.9", I am reading in VCF using readVcf, then writing out using writeVcf, and no-call genotypes get converted to ref, ie.: "./." becomes "0/0", is this normal?
I don't see that behavior in release (1.12.9) or devel (1.13.25). No-call genotypes are read in as a single dot '.' and written out the same; they are not replaced by '0/0'.
In this test file all GT are './.':
This has always been the behavior in read/writeVcf(). If there is a need/request, I can look into reading and writing as './.' instead of the single dot.
If you have a reproducible example of where './.' is written as '0/0' please post it or send to me off line.
Valerie