Question: retrieving phastConsScores via 'scores' from 'VariantFiltering' shows inconsistency
gravatar for Stefanie Tauber
4.6 years ago by
Stefanie Tauber40 wrote:


I would like to retrieve phastconsscores for a set of ranges from the package 'phastCons100way.UCSC.hg19'. By default, scores will average the retrieved phastconsscores if more than one position is retrieved. Here I retrieve 10 times two positions. Interestingly, I get different values if I retrieve the scores for the entire set of ranges or, respectively, for just a subset:



 x = GRanges(seqnames = c("chr12", "chr15", "chr15", "chr2","chr2", "chr2","chr3","chr5","chr2","chr15"),
                         ranges = IRanges(start =  c(46357547,  99482408,99482417,42484368,42484372, 180849279,63983256,38952875, 85768190, 83676233),
                         width = rep(2,10)), strand = rep("+",10))

scores(phastCons100way.UCSC.hg19, x)

[1] 0.00 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00

scores(phastCons100way.UCSC.hg19, x[1:4])

[1] 0 0 0 0


Why do I have on the fourth position once '0' and once '0.5'?





> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

 [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8    
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] phastCons100way.UCSC.hg19_3.1.0   VariantFiltering_1.4.0           
 [3] VariantAnnotation_1.14.0          Rsamtools_1.20.0                 
 [5] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.36.0                  
 [7] rtracklayer_1.28.0                Biostrings_2.36.0                
 [9] XVector_0.8.0                     GenomicRanges_1.20.1             
[11] GenomeInfoDb_1.4.0                IRanges_2.2.0                    
[13] S4Vectors_0.6.0                   BiocGenerics_0.14.0              

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.5             biovizBase_1.16.0       lattice_0.20-31        
 [4] digest_0.6.8            mime_0.3                R6_2.0.1               
 [7] plyr_1.8.1              futile.options_1.0.0    acepack_1.3-3.3        
[10] RSQLite_1.0.0           ggplot2_1.0.1           zlibbioc_1.14.0        
[13] GenomicFeatures_1.20.0  rpart_4.1-9             proto_0.3-10           
[16] splines_3.2.0           BiocParallel_1.2.0      stringr_0.6.2          
[19] foreign_0.8-63          RCurl_1.95-4.5          biomaRt_2.24.0         
[22] munsell_0.4.2           shiny_0.11.1            httpuv_1.3.2           
[25] Gviz_1.12.0             htmltools_0.2.6         nnet_7.3-9             
[28] Hmisc_3.15-0            matrixStats_0.14.0      XML_3.98-1.1           
[31] GenomicAlignments_1.4.0 MASS_7.3-40             bitops_1.0-6           
[34] grid_3.2.0              RBGL_1.44.0             xtable_1.7-4           
[37] gtable_0.1.2            DBI_0.3.1               scales_0.2.4           
[40] graph_1.46.0            reshape2_1.4.1          latticeExtra_0.6-26    
[43] futile.logger_1.4       Formula_1.2-1           lambda.r_1.1.7         
[46] RColorBrewer_1.1-2      tools_3.2.0             RJSONIO_1.3-0          
[49] dichromat_2.0-0         Biobase_2.28.0          survival_2.38-1        
[52] AnnotationDbi_1.30.0    colorspace_1.2-6        cluster_2.0.1  


variantfiltering phastcons • 774 views
ADD COMMENTlink modified 4.6 years ago by Robert Castelo2.3k • written 4.6 years ago by Stefanie Tauber40
Answer: retrieving phastConsScores via 'scores' from 'VariantFiltering' shows inconsist
gravatar for Robert Castelo
4.6 years ago by
Robert Castelo2.3k
Spain/Barcelona/Universitat Pompeu Fabra
Robert Castelo2.3k wrote:

Hi Stefanie,

thanks for the bug report and the reproducible example. The scores() method was not handling properly the case in which scores were averaged from muliple-nucleotide ranges in a GRanges object with un-ordered sequence names, The 0.05 value belonged to the last range from chr15 which was mistakenly being inserted in position 4th before chr2 because the original order was not taken into accout as it correctly happens with single-nucleotide ranges where this was working correctly.

This has been fixed in release 1.4.1 and devel 1.5.1, which should become available tomorrow saturday at about 11pm here in continental Europe.



ADD COMMENTlink written 4.6 years ago by Robert Castelo2.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 364 users visited in the last hour