retrieving phastConsScores via 'scores' from 'VariantFiltering' shows inconsistency
1
0
Entering edit mode
@stefanie-tauber-3978
Last seen 2.6 years ago
Germany

Hi,

I would like to retrieve phastconsscores for a set of ranges from the package 'phastCons100way.UCSC.hg19'. By default, scores will average the retrieved phastconsscores if more than one position is retrieved. Here I retrieve 10 times two positions. Interestingly, I get different values if I retrieve the scores for the entire set of ranges or, respectively, for just a subset:

 

library(BSgenome.Hsapiens.UCSC.hg19)
library(Genomicanges)
library(phastCons100way.UCSC.hg19)

 x = GRanges(seqnames = c("chr12", "chr15", "chr15", "chr2","chr2", "chr2","chr3","chr5","chr2","chr15"),
                         ranges = IRanges(start =  c(46357547,  99482408,99482417,42484368,42484372, 180849279,63983256,38952875, 85768190, 83676233),
                         width = rep(2,10)), strand = rep("+",10))

scores(phastCons100way.UCSC.hg19, x)

[1] 0.00 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00

scores(phastCons100way.UCSC.hg19, x[1:4])

[1] 0 0 0 0

 

Why do I have on the fourth position once '0' and once '0.5'?

 

Best,

Stefanie

 

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] phastCons100way.UCSC.hg19_3.1.0   VariantFiltering_1.4.0           
 [3] VariantAnnotation_1.14.0          Rsamtools_1.20.0                 
 [5] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.36.0                  
 [7] rtracklayer_1.28.0                Biostrings_2.36.0                
 [9] XVector_0.8.0                     GenomicRanges_1.20.1             
[11] GenomeInfoDb_1.4.0                IRanges_2.2.0                    
[13] S4Vectors_0.6.0                   BiocGenerics_0.14.0              

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.5             biovizBase_1.16.0       lattice_0.20-31        
 [4] digest_0.6.8            mime_0.3                R6_2.0.1               
 [7] plyr_1.8.1              futile.options_1.0.0    acepack_1.3-3.3        
[10] RSQLite_1.0.0           ggplot2_1.0.1           zlibbioc_1.14.0        
[13] GenomicFeatures_1.20.0  rpart_4.1-9             proto_0.3-10           
[16] splines_3.2.0           BiocParallel_1.2.0      stringr_0.6.2          
[19] foreign_0.8-63          RCurl_1.95-4.5          biomaRt_2.24.0         
[22] munsell_0.4.2           shiny_0.11.1            httpuv_1.3.2           
[25] Gviz_1.12.0             htmltools_0.2.6         nnet_7.3-9             
[28] Hmisc_3.15-0            matrixStats_0.14.0      XML_3.98-1.1           
[31] GenomicAlignments_1.4.0 MASS_7.3-40             bitops_1.0-6           
[34] grid_3.2.0              RBGL_1.44.0             xtable_1.7-4           
[37] gtable_0.1.2            DBI_0.3.1               scales_0.2.4           
[40] graph_1.46.0            reshape2_1.4.1          latticeExtra_0.6-26    
[43] futile.logger_1.4       Formula_1.2-1           lambda.r_1.1.7         
[46] RColorBrewer_1.1-2      tools_3.2.0             RJSONIO_1.3-0          
[49] dichromat_2.0-0         Biobase_2.28.0          survival_2.38-1        
[52] AnnotationDbi_1.30.0    colorspace_1.2-6        cluster_2.0.1  

 

variantfiltering phastcons • 1.5k views
ADD COMMENT
1
Entering edit mode
Robert Castelo ★ 3.4k
@rcastelo
Last seen 5 weeks ago
Barcelona/Universitat Pompeu Fabra

Hi Stefanie,

thanks for the bug report and the reproducible example. The scores() method was not handling properly the case in which scores were averaged from muliple-nucleotide ranges in a GRanges object with un-ordered sequence names, The 0.05 value belonged to the last range from chr15 which was mistakenly being inserted in position 4th before chr2 because the original order was not taken into accout as it correctly happens with single-nucleotide ranges where this was working correctly.

This has been fixed in release 1.4.1 and devel 1.5.1, which should become available tomorrow saturday at about 11pm here in continental Europe.

cheers,

robert.

ADD COMMENT

Login before adding your answer.

Traffic: 522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6