Entering edit mode
delhomme@embl.de
★
1.2k
@delhommeemblde-3232
Last seen 10.4 years ago
Hi,
I've just discovered that the IRanges coverage function would
"overflow" without warnings. Below is an example that reproduce it:
library(IRanges)
rngs <- IRanges(c(1:100),width=100)
coverage(rngs)
'integer' Rle of length 199 with 199 runs
Lengths: 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1
1 1 1
Values : 1 2 3 4 5 6 7 8 9 10 11 ... 10 9 8 7 6 5 4
3 2 1
coverage(rngs,weight=1e9)
'integer' Rle of length 200 with 200 runs
Lengths: 1 1 1 ... 1
1
Values : 1000000000 2000000000 -1294967296 ... 1000000000
0
runValue(coverage(rngs,weight=1e9))
[1] 1000000000 2000000000 -1294967296 -294967296 705032704
1705032704
[7] -1589934592 -589934592 410065408 1410065408 -1884901888
-884901888
...
Clearly, the third position that has a coverage of 3 (not weighted)
has a 3e9 weighted one which is > 2^31 (signed integer limit on most
machine). I'm just surprised that it is silently ignored.
For NGS, getting a bp coverage > 2^31 is unlikely, although I've
already seen extremely high coverage for Ribosomal-like protein that
were only 10 order of magnitude away (~2M X). This limits the ranges
of weights that can be used (weight as of now can only be integers),
i.e. a weight of 100 would already be borderline.
Is there a way around this, coverage being such a very handy function?
I understand that weight being integers probably makes computation
faster, but what could be the overhead of allowing numeric instead?
And I don't mind looking under the hood if that helps.
Cheers,
Nico
sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] C/UTF-8/C/C/C/C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] IRanges_1.15.17 BiocGenerics_0.3.0
loaded via a namespace (and not attached):
[1] stats4_2.15.1 tools_2.15.1
---------------------------------------------------------------
Nicolas Delhomme
Genome Biology Computational Support
European Molecular Biology Laboratory
Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany