Question: Vcf to VRanges conversion error: ["incorrect number of dimensions"]
0
gravatar for maria.vila
2.9 years ago by
maria.vila0
maria.vila0 wrote:

Hi all!

I'm trying to read a vcf file with readVcfAsVRanges but I get the following error:(readVcf works, but then I get the error when I want to convert it to an VRanges object)

 

a <- readVcfAsVRanges("myprovalim.vcf",genome="hg19")
Error in ad[, , 1, drop = FALSE] : incorrect number of dimensions

 

I am using  SomaticSignatures_2.6.1  and   VariantAnnotation_1.16.4                   
 

This is the original file:

 

##fileformat=VCFv4.1
##source=VarScan2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)">
##INFO=<ID=SSC,Number=1,Type=String,Description="Somatic score in Phred scale (0-255) derived from somatic p-value">
##INFO=<ID=GPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor+normal versus no variant for Germline calls">
##INFO=<ID=SPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor versus normal for Somatic/LOH calls">
##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency">
##FORMAT=<ID=DP4,Number=1,Type=String,Description="Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    NORMAL    TUMOR
chr1    762273    .    G    A    .    PASS    DP=223;SS=1;SSC=0;GPV=5.8167E-133;SPV=1E0    GT:GQ:DP:RD:AD:FREQ:DP4    1/1:.:105:0:105:100%:0,0,7,98    1/1:.:118:0:117:100%:0,0,10,107
chr1    808922    .    G    A    .    PASS    DP=21;SS=1;SSC=0;GPV=1.8578E-12;SPV=1E0    GT:GQ:DP:RD:AD:FREQ:DP4    1/1:.:12:0:12:100%:0,0,10,2    1/1:.:9:0:9:100%:0,0,8,1

 

Does anyone know what is happening? Can readVcfAsVRanges read vcf's with more than one sample? Any help would be appreciated!

 

Many thanks,

Maria

ADD COMMENTlink modified 2.9 years ago by Michael Lawrence11k • written 2.9 years ago by maria.vila0
Answer: Vcf to VRanges conversion error: ["incorrect number of dimensions"]
0
gravatar for Michael Lawrence
2.9 years ago by
United States
Michael Lawrence11k wrote:

We're assuming AD represents the allele depth, i.e., the GATK convention. I just committed VariantAnnotation 1.18.3 that attempts to handle the Varscan2 convention. Please try once it appears.

ADD COMMENTlink written 2.9 years ago by Michael Lawrence11k

Maria, please note that you will need to update your Bioconductor installation in order to profit from this patch. You are very likely running an outdated and unsupported version of Bioc, judging from the version 1.16.4 of VariantAnnotation. 

ADD REPLYlink written 2.9 years ago by Julian Gehring1.3k

Could be also a problem with he Ubuntu version? I've just updated R and Bioconductor and installed VariantAnnotation 1.19.5 (should this also handle Varscan2?), and I'm getting the same error:

 

> a <- readVcfAsVRanges("myprovalim.vcf",genome="hg19")
Error in ad[, , 1, drop = FALSE] : incorrect number of dimensions
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)

locale:
 [1] LC_CTYPE=ca_ES.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=ca_ES.UTF-8        LC_COLLATE=ca_ES.UTF-8    
 [5] LC_MONETARY=ca_ES.UTF-8    LC_MESSAGES=ca_ES.UTF-8   
 [7] LC_PAPER=ca_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=ca_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] VariantAnnotation_1.19.5   Rsamtools_1.24.0          
 [3] Biostrings_2.40.2          XVector_0.12.0            
 [5] SummarizedExperiment_1.2.3 Biobase_2.32.0            
 [7] GenomicRanges_1.24.2       GenomeInfoDb_1.8.1        
 [9] IRanges_2.6.1              S4Vectors_0.10.1          
[11] BiocGenerics_0.18.0       

loaded via a namespace (and not attached):
 [1] XML_3.98-1.4            GenomicAlignments_1.8.3 bitops_1.0-6           
 [4] DBI_0.4-1               RSQLite_1.0.0           GenomicFeatures_1.24.3
 [7] zlibbioc_1.18.0         BiocParallel_1.6.2      tools_3.3.1            
[10] BSgenome_1.40.1         biomaRt_2.28.0          RCurl_1.95-4.8         
[13] rtracklayer_1.32.1      AnnotationDbi_1.34.3   

ADD REPLYlink written 2.9 years ago by maria.vila0
1

No you have to wait for 1.19.7 to show up.

ADD REPLYlink written 2.9 years ago by Michael Lawrence11k

Also, your packages are a mixture of 'release' (even-numbered bioc) and 'devel'; unless the fix has been ported to the release branch, you'll have to follow http://bioconductor.org/developers/how-to/useDevel/ to make sure you are using the devel branch. Use BiocInstaller::biocValid() to validate that your packages are consistent.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Martin Morgan ♦♦ 23k
2

It was ported to 1.18.3 so once things are consistent release should work.

ADD REPLYlink written 2.9 years ago by Michael Lawrence11k

Many thanks!! It's working now with version 1.18.3
But I've another error:


a<-readVcfAsVRanges("myprova2.vcf",genome="hg19")

Error in if ((length(unique(elt)) != 1L) && (dim(AD)[3] != unique(elt))) { :
  missing value where TRUE/FALSE needed


It's coming from this record (or similar ones: two alternative alleles)

chr1    808922    .    G    A,T    .    PASS    DP=21;SS=1;SSC=0;GPV=1.8578E-12;SPV=1E0    GT:GQ:DP:RD:AD:FREQ:DP4    1/1:.:12:0:12:100%:0,0,10,2    2/2:.:9:0:9:100%:0,0,8,1

Does the package handle this cases?

ADD REPLYlink written 2.9 years ago by maria.vila0

There is a bug in the package that causes this error, but the root problem is that the AD field is defined in the header with Number=1, while it should be Number=A, if there is one value per alt allele. I also noticed that the Number for DP4 is wrong. It should be "." not 1.

With the upcoming VariantAnnotation 1.18.5, using this invalid file, only the counts for the first allele will be used, even for the second allele, which is obviously wrong. Ideally, the VCF parser would throw an error or warning if it encounters multiple values for a Number=1 field. Right now, it silently takes the first. Maybe one of the authors of the parser could consider that.

ADD REPLYlink written 2.9 years ago by Michael Lawrence11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 103 users visited in the last hour