I am trying to obtain coverage summary stats (e.g. mean/sum) from a BigWig across a set of ranges. I would like to know if I'm currently doing this in the fastest possible way using rtracklayer::import
.
If I set as = "GRanges"
, this gives one score per range in the BigWig (rather than the desired 1 score per input range).
If I set as = "NumericList"
, I can obtain the coverage across individual bases then compute the summary stats for each range using e.g. BiocGenerics::mean
. However, I was wondering if there exists a (faster) method of calculating this directly using rtracklayer::import
? I can't seem to find anything in the vignette or function documentation.
test_path <- system.file("tests", package = "rtracklayer")
test_bw <- file.path(test_path, "test.bw")
ranges <- GenomicRanges::GRanges(c("chr2:1-400", "chr2:500-1000"))
gr <- rtracklayer::import(test_bw, which = ranges, as = "GRanges")
gr
#> GRanges object with 5 ranges and 1 metadata column:
#> seqnames ranges strand | score
#> <Rle> <IRanges> <Rle> | <numeric>
#> [1] chr2 1-300 * | -1.00
#> [2] chr2 301-400 * | -0.75
#> [3] chr2 500-600 * | -0.75
#> [4] chr2 601-900 * | -0.50
#> [5] chr2 901-1000 * | -0.25
#> -------
#> seqinfo: 2 sequences from an unspecified genome
nl <- rtracklayer::import(test_bw, which = ranges, as = "NumericList")
BiocGenerics::mean(nl)
#> [1] -0.937500 -0.500499
sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] pillar_1.4.6 compiler_4.0.3
#> [3] highr_0.8 GenomeInfoDb_1.26.0
#> [5] XVector_0.30.0 MatrixGenerics_1.2.0
#> [7] bitops_1.0-6 tools_4.0.3
#> [9] zlibbioc_1.36.0 digest_0.6.27
#> [11] lattice_0.20-41 evaluate_0.14
#> [13] lifecycle_0.2.0 tibble_3.0.4
#> [15] pkgconfig_2.0.3 rlang_0.4.8
#> [17] Matrix_1.2-18 reprex_0.3.0.9001
#> [19] DelayedArray_0.16.0 rstudioapi_0.11
#> [21] yaml_2.2.1 parallel_4.0.3
#> [23] xfun_0.19 GenomeInfoDbData_1.2.4
#> [25] rtracklayer_1.50.0 styler_1.3.2
#> [27] stringr_1.4.0 knitr_1.30
#> [29] Biostrings_2.58.0 fs_1.5.0
#> [31] vctrs_0.3.4 S4Vectors_0.28.0
#> [33] IRanges_2.24.0 grid_4.0.3
#> [35] stats4_4.0.3 Biobase_2.50.0
#> [37] XML_3.99-0.5 BiocParallel_1.24.0
#> [39] rmarkdown_2.5 purrr_0.3.4
#> [41] magrittr_1.5 matrixStats_0.57.0
#> [43] GenomicAlignments_1.26.0 backports_1.2.0
#> [45] Rsamtools_2.6.0 ellipsis_0.3.1
#> [47] htmltools_0.5.0 BiocGenerics_0.36.0
#> [49] GenomicRanges_1.42.0 SummarizedExperiment_1.20.0
#> [51] stringi_1.5.3 RCurl_1.98-1.2
#> [53] crayon_1.3.4
Created on 2020-11-09 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0.9001)