Question: Maximal length of Rle vectors
0
8.0 years ago by
United States
Hans-Ulrich Klein330 wrote:
Dear all, I observed this problem regarding the maximal length of a Rle vector: > rle = Rle(rep(0, 1000000000)) > length(rle) [1] 1000000000 > length(c(rle, rle, rle)) [1] -1294967296 Probably, it is caused by the maximum positive number (~2.1E9) that can be represented by an integer variable. However, there is no warning message. I noticed this problem when I wanted to calculate the average coverage of a sequencing project accross the human genome. I used the coverage() method and then concatenated all chromosomes. This should give me an Rle vector of length ~3*109, but mean() does not work on that vector. Best, Hans-Ulrich > sessionInfo() R version 2.14.0 (2011-10-31) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] IRanges_1.12.3 [[alternative HTML version deleted]]
sequencing • 614 views
modified 8.0 years ago by Hervé Pagès ♦♦ 14k • written 8.0 years ago by Hans-Ulrich Klein330
Answer: Maximal length of Rle vectors
0
8.0 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:
Hi Hans-Ulrich, Thanks for the bug report. A fix is on its way. It will raise an error when one is trying to create an Rle with length > .Machine$integer.max. Allowing an Rle to have a length > .Machine$integer.max, even with a warning, would cause all sort of problems, the first of them being that its length would be NA: > Rle(1:2, c(1500000000, 1500000000)) 'integer' Rle of length NA with 2 runs Lengths: 1500000000 1500000000 Values : 1 2 Warning message: In sum(runLength(x)) : Integer overflow - use sum(as.numeric(.)) Note that the coverage accross the human genome is best represented by a named RleList (with one element per chromosome), which doesn't have the .Machine\$integer.max limitation. See the "GenomicRanges Use Cases" vignette in the GenomicRanges packages for an illustration of this. Cheers, H. On 11-11-30 09:28 AM, Hans-Ulrich Klein wrote: > Dear all, > > I observed this problem regarding the maximal length of a Rle vector: > > > rle = Rle(rep(0, 1000000000)) > > length(rle) > [1] 1000000000 > > length(c(rle, rle, rle)) > [1] -1294967296 > > > Probably, it is caused by the maximum positive number (~2.1E9) that can > be represented by an integer variable. However, there is no warning > message. > I noticed this problem when I wanted to calculate the average coverage > of a sequencing project accross the human genome. I used the coverage() > method and then concatenated all chromosomes. This should give me an Rle > vector of length ~3*109, but mean() does not work on that vector. > > Best, > Hans-Ulrich > > > > sessionInfo() > R version 2.14.0 (2011-10-31) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] IRanges_1.12.3 > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319