Search
Question: retrieving phastConsScores via 'scores' from 'VariantFiltering' shows inconsistency
0
gravatar for Stefanie Tauber
3.3 years ago by
Germany
Stefanie Tauber40 wrote:

Hi,

I would like to retrieve phastconsscores for a set of ranges from the package 'phastCons100way.UCSC.hg19'. By default, scores will average the retrieved phastconsscores if more than one position is retrieved. Here I retrieve 10 times two positions. Interestingly, I get different values if I retrieve the scores for the entire set of ranges or, respectively, for just a subset:

 

library(BSgenome.Hsapiens.UCSC.hg19)
library(Genomicanges)
library(phastCons100way.UCSC.hg19)

 x = GRanges(seqnames = c("chr12", "chr15", "chr15", "chr2","chr2", "chr2","chr3","chr5","chr2","chr15"),
                         ranges = IRanges(start =  c(46357547,  99482408,99482417,42484368,42484372, 180849279,63983256,38952875, 85768190, 83676233),
                         width = rep(2,10)), strand = rep("+",10))

scores(phastCons100way.UCSC.hg19, x)

[1] 0.00 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00

scores(phastCons100way.UCSC.hg19, x[1:4])

[1] 0 0 0 0

 

Why do I have on the fourth position once '0' and once '0.5'?

 

Best,

Stefanie

 

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] phastCons100way.UCSC.hg19_3.1.0   VariantFiltering_1.4.0           
 [3] VariantAnnotation_1.14.0          Rsamtools_1.20.0                 
 [5] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.36.0                  
 [7] rtracklayer_1.28.0                Biostrings_2.36.0                
 [9] XVector_0.8.0                     GenomicRanges_1.20.1             
[11] GenomeInfoDb_1.4.0                IRanges_2.2.0                    
[13] S4Vectors_0.6.0                   BiocGenerics_0.14.0              

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.5             biovizBase_1.16.0       lattice_0.20-31        
 [4] digest_0.6.8            mime_0.3                R6_2.0.1               
 [7] plyr_1.8.1              futile.options_1.0.0    acepack_1.3-3.3        
[10] RSQLite_1.0.0           ggplot2_1.0.1           zlibbioc_1.14.0        
[13] GenomicFeatures_1.20.0  rpart_4.1-9             proto_0.3-10           
[16] splines_3.2.0           BiocParallel_1.2.0      stringr_0.6.2          
[19] foreign_0.8-63          RCurl_1.95-4.5          biomaRt_2.24.0         
[22] munsell_0.4.2           shiny_0.11.1            httpuv_1.3.2           
[25] Gviz_1.12.0             htmltools_0.2.6         nnet_7.3-9             
[28] Hmisc_3.15-0            matrixStats_0.14.0      XML_3.98-1.1           
[31] GenomicAlignments_1.4.0 MASS_7.3-40             bitops_1.0-6           
[34] grid_3.2.0              RBGL_1.44.0             xtable_1.7-4           
[37] gtable_0.1.2            DBI_0.3.1               scales_0.2.4           
[40] graph_1.46.0            reshape2_1.4.1          latticeExtra_0.6-26    
[43] futile.logger_1.4       Formula_1.2-1           lambda.r_1.1.7         
[46] RColorBrewer_1.1-2      tools_3.2.0             RJSONIO_1.3-0          
[49] dichromat_2.0-0         Biobase_2.28.0          survival_2.38-1        
[52] AnnotationDbi_1.30.0    colorspace_1.2-6        cluster_2.0.1  

 

ADD COMMENTlink modified 3.3 years ago by Robert Castelo2.2k • written 3.3 years ago by Stefanie Tauber40
1
gravatar for Robert Castelo
3.3 years ago by
Robert Castelo2.2k
Spain/Barcelona/Universitat Pompeu Fabra
Robert Castelo2.2k wrote:

Hi Stefanie,

thanks for the bug report and the reproducible example. The scores() method was not handling properly the case in which scores were averaged from muliple-nucleotide ranges in a GRanges object with un-ordered sequence names, The 0.05 value belonged to the last range from chr15 which was mistakenly being inserted in position 4th before chr2 because the original order was not taken into accout as it correctly happens with single-nucleotide ranges where this was working correctly.

This has been fixed in release 1.4.1 and devel 1.5.1, which should become available tomorrow saturday at about 11pm here in continental Europe.

cheers,

robert.

ADD COMMENTlink written 3.3 years ago by Robert Castelo2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 251 users visited in the last hour