Search
Question: readVCFAsVRanges in VariantAnnotation broken by an FTZ format?
0
gravatar for Andy Lynch
15 months ago by
Andy Lynch80
United Kingdom
Andy Lynch80 wrote:

I've noticed that the inclusion of a "##FORMAT=<ID=FTZ,Nu..." line in a vcf file seems to break the readVCFAsVRanges() function (although not the readVCF() function). CaVEMan, for example, includes such a line in its vcf files. Replacing all instances of FTZ with, e.g., FUZ in the vcf file avoids the problem, but is not a desirable solution.
 

This can be illustrated with two toy vcf files, the only difference of which is the FTZ/FUZ switch. [Variant location and nature have been changed to ensure that no patient data are presented.]

FTZtest.vcf:

##fileformat=VCFv4.1
##fileDate=20160629
##reference=/datastore/reference_files/genome.fa
##vcfProcessLog=<InputVCF=<.>,InputVCFSource=<CaVEMan>,InputVCFParam=<NORMAL_CONTAMINATION=0.1,PRIOR_MUT_RATE=6e-06,REF_BIAS=0.95,SNP_CUTOFF=0.95,MUT_CUTOFF=0.8,PRIOR_SNP_RATE=0.0001>>
##cavemanVersion=1.9.2
##contig=<ID=1,assembly=GRCh37,length=249250621,species=HUMAN>
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=FTZ,Number=1,Type=Integer,Description="Reads presenting a T for this position, forward strand">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    NORMAL    TUMOUR
1    16074    ID    C    A    .    PASS    DP=61    GT:FTZ    0|0:20    0|1:15

FUZtest.vcf:

##fileformat=VCFv4.1
##fileDate=20160629
##reference=/datastore/reference_files/genome.fa
##vcfProcessLog=<InputVCF=<.>,InputVCFSource=<CaVEMan>,InputVCFParam=<NORMAL_CONTAMINATION=0.1,PRIOR_MUT_RATE=6e-06,REF_BIAS=0.95,SNP_CUTOFF=0.95,MUT_CUTOFF=0.8,PRIOR_SNP_RATE=0.0001>>
##cavemanVersion=1.9.2
##contig=<ID=1,assembly=GRCh37,length=249250621,species=HUMAN>
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=FUZ,Number=1,Type=Integer,Description="Reads presenting a T for this position, forward strand">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    NORMAL    TUMOUR
1    16074    ID    C    A    .    PASS    DP=61    GT:FUZ    0|0:20    0|1:15

 

Running the command readVcfAsVRanges("FUZtest.vcf","hg19") works as expected, but readVcfAsVRanges("FTZtest.vcf","hg19") returns an error.

> vcf1<-readVcfAsVRanges("FTZtest.vcf","hg19")
Error in strsplit(x, ";", fixed = TRUE) : non-character argument
> traceback()
10: strsplit(x, ";", fixed = TRUE)
9: parseFilterStrings(as.vector(geno(from)$FT))
8: eval(expr, envir, enclos)
7: eval(quote(list(...)), env)
6: eval(quote(list(...)), env)
5: standardGeneric("cbind")
4: cbind(filter, parseFilterStrings(as.vector(geno(from)$FT)))
3: asMethod(object)
2: as(readVcf(x, genome, param = param, row.names = use.names, ...), 
       "VRanges")
1: readVcfAsVRanges("FTZtest.vcf", 
       "hg19")

 

Anyway. I can't work out why this should be a problem, so thought I would share.  Apologies if I have missed previous references to this.

Andy Lynch (CRUK Cambridge Institute)

 

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8   
 [6] LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] VariantAnnotation_1.18.5   BiocInstaller_1.22.3       QDNAseq_1.8.0              mclust_5.2                 knitr_1.13                
 [6] GenomicAlignments_1.8.4    Rsamtools_1.24.0           Biostrings_2.40.2          XVector_0.12.0             SummarizedExperiment_1.2.3
[11] Biobase_2.32.0             GenomicRanges_1.24.2       GenomeInfoDb_1.8.1         IRanges_2.6.1              S4Vectors_0.10.2          
[16] BiocGenerics_0.18.0       

loaded via a namespace (and not attached):
 [1] formatR_1.4            highr_0.6              R.methodsS3_1.7.1      GenomicFeatures_1.24.3 bitops_1.0-6           R.utils_2.3.0         
 [7] tools_3.3.1            zlibbioc_1.18.0        biomaRt_2.28.0         BSgenome_1.40.1        evaluate_0.9           RSQLite_1.0.0         
[13] DBI_0.4-1              rtracklayer_1.32.1     stringr_1.0.0          CGHbase_1.32.0         impute_1.46.0          marray_1.50.0         
[19] DNAcopy_1.46.0         AnnotationDbi_1.34.4   XML_3.98-1.4           CGHcall_2.34.0         BiocParallel_1.6.2     limma_3.28.14         
[25] magrittr_1.5           matrixStats_0.50.2     BiocStyle_2.0.2        stringi_1.1.1          RCurl_1.95-4.8         R.oo_1.20.0       

 

ADD COMMENTlink modified 15 months ago by Michael Lawrence9.8k • written 15 months ago by Andy Lynch80
1
gravatar for Mike Smith
15 months ago by
Mike Smith2.1k
EMBL Heidelberg / de.NBI
Mike Smith2.1k wrote:

Hi Andy,

This looks like a partial matching problem to me.  The code explicitly looks for the genotype filter FT field, which isn't present in your data, so it then matches to your FTZ field and falls over when this isn't in the format it's expecting.

It should be possible to prevent this by using the [['FT']] method of accessing list entries rather than $FT

If you'd like to temporarily patch this yourself I think editing lines 252 & 253 of methods-VRanges-class.R should be sufficient.

ADD COMMENTlink modified 15 months ago • written 15 months ago by Mike Smith2.1k
1
gravatar for Michael Lawrence
15 months ago by
United States
Michael Lawrence9.8k wrote:

Should be fixed in 1.18.6.

ADD COMMENTlink written 15 months ago by Michael Lawrence9.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 151 users visited in the last hour