Vcf to VRanges conversion error: ["incorrect number of dimensions"]
1
0
Entering edit mode
maria.vila • 0
@mariavila-9891
Last seen 7.8 years ago

Hi all!

I'm trying to read a vcf file with readVcfAsVRanges but I get the following error:(readVcf works, but then I get the error when I want to convert it to an VRanges object)

 

a <- readVcfAsVRanges("myprovalim.vcf",genome="hg19")
Error in ad[, , 1, drop = FALSE] : incorrect number of dimensions

 

I am using  SomaticSignatures_2.6.1  and   VariantAnnotation_1.16.4                   
 

This is the original file:

 

##fileformat=VCFv4.1
##source=VarScan2
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total depth of quality bases">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=SS,Number=1,Type=String,Description="Somatic status of variant (0=Reference,1=Germline,2=Somatic,3=LOH, or 5=Unknown)">
##INFO=<ID=SSC,Number=1,Type=String,Description="Somatic score in Phred scale (0-255) derived from somatic p-value">
##INFO=<ID=GPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor+normal versus no variant for Germline calls">
##INFO=<ID=SPV,Number=1,Type=Float,Description="Fisher's Exact Test P-value of tumor versus normal for Somatic/LOH calls">
##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency">
##FORMAT=<ID=DP4,Number=1,Type=String,Description="Strand read counts: ref/fwd, ref/rev, var/fwd, var/rev">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    NORMAL    TUMOR
chr1    762273    .    G    A    .    PASS    DP=223;SS=1;SSC=0;GPV=5.8167E-133;SPV=1E0    GT:GQ:DP:RD:AD:FREQ:DP4    1/1:.:105:0:105:100%:0,0,7,98    1/1:.:118:0:117:100%:0,0,10,107
chr1    808922    .    G    A    .    PASS    DP=21;SS=1;SSC=0;GPV=1.8578E-12;SPV=1E0    GT:GQ:DP:RD:AD:FREQ:DP4    1/1:.:12:0:12:100%:0,0,10,2    1/1:.:9:0:9:100%:0,0,8,1

 

Does anyone know what is happening? Can readVcfAsVRanges read vcf's with more than one sample? Any help would be appreciated!

 

Many thanks,

Maria

variantannotation somaticsignatures • 1.9k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States

We're assuming AD represents the allele depth, i.e., the GATK convention. I just committed VariantAnnotation 1.18.3 that attempts to handle the Varscan2 convention. Please try once it appears.

ADD COMMENT
0
Entering edit mode

Maria, please note that you will need to update your Bioconductor installation in order to profit from this patch. You are very likely running an outdated and unsupported version of Bioc, judging from the version 1.16.4 of VariantAnnotation. 

ADD REPLY
0
Entering edit mode

Could be also a problem with he Ubuntu version? I've just updated R and Bioconductor and installed VariantAnnotation 1.19.5 (should this also handle Varscan2?), and I'm getting the same error:

 

> a <- readVcfAsVRanges("myprovalim.vcf",genome="hg19")
Error in ad[, , 1, drop = FALSE] : incorrect number of dimensions
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)

locale:
 [1] LC_CTYPE=ca_ES.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=ca_ES.UTF-8        LC_COLLATE=ca_ES.UTF-8    
 [5] LC_MONETARY=ca_ES.UTF-8    LC_MESSAGES=ca_ES.UTF-8   
 [7] LC_PAPER=ca_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=ca_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] VariantAnnotation_1.19.5   Rsamtools_1.24.0          
 [3] Biostrings_2.40.2          XVector_0.12.0            
 [5] SummarizedExperiment_1.2.3 Biobase_2.32.0            
 [7] GenomicRanges_1.24.2       GenomeInfoDb_1.8.1        
 [9] IRanges_2.6.1              S4Vectors_0.10.1          
[11] BiocGenerics_0.18.0       

loaded via a namespace (and not attached):
 [1] XML_3.98-1.4            GenomicAlignments_1.8.3 bitops_1.0-6           
 [4] DBI_0.4-1               RSQLite_1.0.0           GenomicFeatures_1.24.3
 [7] zlibbioc_1.18.0         BiocParallel_1.6.2      tools_3.3.1            
[10] BSgenome_1.40.1         biomaRt_2.28.0          RCurl_1.95-4.8         
[13] rtracklayer_1.32.1      AnnotationDbi_1.34.3   

ADD REPLY
1
Entering edit mode

No you have to wait for 1.19.7 to show up.

ADD REPLY
0
Entering edit mode

Also, your packages are a mixture of 'release' (even-numbered bioc) and 'devel'; unless the fix has been ported to the release branch, you'll have to follow http://bioconductor.org/developers/how-to/useDevel/ to make sure you are using the devel branch. Use BiocInstaller::biocValid() to validate that your packages are consistent.

ADD REPLY
2
Entering edit mode

It was ported to 1.18.3 so once things are consistent release should work.

ADD REPLY
0
Entering edit mode

Many thanks!! It's working now with version 1.18.3
But I've another error:


a<-readVcfAsVRanges("myprova2.vcf",genome="hg19")

Error in if ((length(unique(elt)) != 1L) && (dim(AD)[3] != unique(elt))) { :
  missing value where TRUE/FALSE needed


It's coming from this record (or similar ones: two alternative alleles)

chr1    808922    .    G    A,T    .    PASS    DP=21;SS=1;SSC=0;GPV=1.8578E-12;SPV=1E0    GT:GQ:DP:RD:AD:FREQ:DP4    1/1:.:12:0:12:100%:0,0,10,2    2/2:.:9:0:9:100%:0,0,8,1

Does the package handle this cases?

ADD REPLY
0
Entering edit mode

There is a bug in the package that causes this error, but the root problem is that the AD field is defined in the header with Number=1, while it should be Number=A, if there is one value per alt allele. I also noticed that the Number for DP4 is wrong. It should be "." not 1.

With the upcoming VariantAnnotation 1.18.5, using this invalid file, only the counts for the first allele will be used, even for the second allele, which is obviously wrong. Ideally, the VCF parser would throw an error or warning if it encounters multiple values for a Number=1 field. Right now, it silently takes the first. Maybe one of the authors of the parser could consider that.

ADD REPLY

Login before adding your answer.

Traffic: 899 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6