I've noticed that the inclusion of a "##FORMAT=<ID=FTZ,Nu..."
line in a vcf file seems to break the readVCFAsVRanges()
function (although not the readVCF()
function). CaVEMan, for example, includes such a line in its vcf files. Replacing all instances of FTZ with, e.g., FUZ in the vcf file avoids the problem, but is not a desirable solution.
This can be illustrated with two toy vcf files, the only difference of which is the FTZ/FUZ switch. [Variant location and nature have been changed to ensure that no patient data are presented.]
FTZtest.vcf:
##fileformat=VCFv4.1 ##fileDate=20160629 ##reference=/datastore/reference_files/genome.fa ##vcfProcessLog=<InputVCF=<.>,InputVCFSource=<CaVEMan>,InputVCFParam=<NORMAL_CONTAMINATION=0.1,PRIOR_MUT_RATE=6e-06,REF_BIAS=0.95,SNP_CUTOFF=0.95,MUT_CUTOFF=0.8,PRIOR_SNP_RATE=0.0001>> ##cavemanVersion=1.9.2 ##contig=<ID=1,assembly=GRCh37,length=249250621,species=HUMAN> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=FTZ,Number=1,Type=Integer,Description="Reads presenting a T for this position, forward strand"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOUR 1 16074 ID C A . PASS DP=61 GT:FTZ 0|0:20 0|1:15
FUZtest.vcf:
##fileformat=VCFv4.1 ##fileDate=20160629 ##reference=/datastore/reference_files/genome.fa ##vcfProcessLog=<InputVCF=<.>,InputVCFSource=<CaVEMan>,InputVCFParam=<NORMAL_CONTAMINATION=0.1,PRIOR_MUT_RATE=6e-06,REF_BIAS=0.95,SNP_CUTOFF=0.95,MUT_CUTOFF=0.8,PRIOR_SNP_RATE=0.0001>> ##cavemanVersion=1.9.2 ##contig=<ID=1,assembly=GRCh37,length=249250621,species=HUMAN> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=FUZ,Number=1,Type=Integer,Description="Reads presenting a T for this position, forward strand"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOUR 1 16074 ID C A . PASS DP=61 GT:FUZ 0|0:20 0|1:15
Running the command readVcfAsVRanges("FUZtest.vcf","hg19")
works as expected, but readVcfAsVRanges("FTZtest.vcf","hg19")
returns an error.
> vcf1<-readVcfAsVRanges("FTZtest.vcf","hg19")
Error in strsplit(x, ";", fixed = TRUE) : non-character argument
> traceback()
10: strsplit(x, ";", fixed = TRUE)
9: parseFilterStrings(as.vector(geno(from)$FT))
8: eval(expr, envir, enclos)
7: eval(quote(list(...)), env)
6: eval(quote(list(...)), env)
5: standardGeneric("cbind")
4: cbind(filter, parseFilterStrings(as.vector(geno(from)$FT)))
3: asMethod(object)
2: as(readVcf(x, genome, param = param, row.names = use.names, ...),
"VRanges")
1: readVcfAsVRanges("FTZtest.vcf",
"hg19")
Anyway. I can't work out why this should be a problem, so thought I would share. Apologies if I have missed previous references to this.
Andy Lynch (CRUK Cambridge Institute)
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8
[6] LC_MESSAGES=en_GB.UTF-8 LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] VariantAnnotation_1.18.5 BiocInstaller_1.22.3 QDNAseq_1.8.0 mclust_5.2 knitr_1.13
[6] GenomicAlignments_1.8.4 Rsamtools_1.24.0 Biostrings_2.40.2 XVector_0.12.0 SummarizedExperiment_1.2.3
[11] Biobase_2.32.0 GenomicRanges_1.24.2 GenomeInfoDb_1.8.1 IRanges_2.6.1 S4Vectors_0.10.2
[16] BiocGenerics_0.18.0
loaded via a namespace (and not attached):
[1] formatR_1.4 highr_0.6 R.methodsS3_1.7.1 GenomicFeatures_1.24.3 bitops_1.0-6 R.utils_2.3.0
[7] tools_3.3.1 zlibbioc_1.18.0 biomaRt_2.28.0 BSgenome_1.40.1 evaluate_0.9 RSQLite_1.0.0
[13] DBI_0.4-1 rtracklayer_1.32.1 stringr_1.0.0 CGHbase_1.32.0 impute_1.46.0 marray_1.50.0
[19] DNAcopy_1.46.0 AnnotationDbi_1.34.4 XML_3.98-1.4 CGHcall_2.34.0 BiocParallel_1.6.2 limma_3.28.14
[25] magrittr_1.5 matrixStats_0.50.2 BiocStyle_2.0.2 stringi_1.1.1 RCurl_1.95-4.8 R.oo_1.20.0