Integer overflow when summing an 'integer' Rle
1
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States
Hi Nico, The following fixes have been applied to IRanges 1.15.43 (1) The 'Integer overflow' warning thrown by sum() on an integer-Rle is now more appropriate, library(IRanges) x <- Rle(values=as.integer(c(1, 2^31 -1, 1))) > sum(x) [1] NA Warning message: In sum(runValue(x) * runLength(x), ..., na.rm = na.rm) : Integer overflow - use runValue(.) <- as.numeric(runValue(.)) // (2) integers are coerced to numeric when calling mean() on an integer- Rle > mean(x) [1] 715827883 Valerie ## Paste of original correspondence between Nico and Herve [BioC] Integer overflow when summing an 'integer' Rle Nicolas Delhomme delhomme at embl.de Tue Feb 14 17:35:48 CET 2012 Salut Hervé, Bonne année! Well, we're already mid-Feb, but still most of it is in front of us ;-) On 10 Feb 2012, at 19:30, Hervé Pagès wrote: > Salut Nico, > > On 02/10/2012 08:04 AM, Nicolas Delhomme wrote: >> Hi all, >> >> While calculating some statistics of an RNA-seq experiment I tumbled onto the following problem. Applying the IRanges coverage function to my IRanges, I get back an integer Rle object. However trying to get the mean or sum of that Rle object results in an integer overflow. The following example just exemplify that overflow. >> >> library(IRanges) >> rC<- Rle(values=as.integer(c(1,(2^31)-1,1))) >> sum(rC) >> mean(rC) >> >> Both result in an integer overflow. >> >> [1] NA >> Warning message: >> In sum(runValue(x) * runLength(x), ..., na.rm = na.rm) : >> Integer overflow - use sum(as.numeric(.)) >> >> The solution to that is to do the following: >> >> sum(as.numeric(runLength(rC) * runValue(rC))) > > Another solution is to convert the 'integer' Rle into a 'numeric' Rle > before doing sum(). Unfortunately, since we don't have separate > classes for those (like for example an IntegerRle and a DoubleRle > class) it cannot be done using direct coercion i.e. with something > like: > > as(rC, "DoubleRle") > > (Maybe we should have individual Rle subclasses for 'integer' Rle, > 'numeric' Rle, 'logical' Rle, 'character' Rle, 'factor' Rle etc...) > That could be useful. I, a few times, had to do quite some conversions to go back and forth between different Rle "kinds". Having subclasses would be great. > So for now, this conversion must be done with: > > > class(runValue(rC)) <- "double" > > rC > 'numeric' Rle of length 3 with 3 runs > Lengths: 1 1 1 > Values : 1 2147483647 1 > > This works fine with an Rle, but not so much with an RleList where > one needs to do some ugly contortions in order to succeed. Well, I ended up doing that in an lapply and it works just fine. Not the most efficient memory wise though. > > Alternatively to having individual Rle subclasses maybe we could have > an accessor e.g. rleValueType(), with getter and setters, so we could > do: > > > rleValueType(rC) > [1] "integer" > > rleValueType(rC) <- "double" > > and that would work on Rle and RleList objects. > That would indeed be very useful and probably easier to implement. > Anyway, even though I think having an easy/unified way for changing > the type of the values in Rle/RleList objects is important, maybe > I'm going slightly off-topic. > > What we should definitely do now is replace this warning: > > Warning message: > In sum(runValue(x) * runLength(x), ..., na.rm = na.rm) : > Integer overflow - use sum(as.numeric(.)) > > by a more appropriate one (doing as.numeric() on an Rle is not a good > idea). > Indeed. >> >> but IMO it should be handled at the Rle level code; i.e. an integer Rle can clearly have a sum, a mean, etc... result that involve calculating values outside the integer range. > > I agree for mean() so I'll fix that. > > But for sum()... "calculating values outside the integer range", > even if the result of this calculation itself is not in the > integer range? base::sum() will return NA if the result is not in > the integer range and I think that's the right thing to do. > I don't like the idea of sum() returning a double when the input > is integer. > I'm on the same page here. Consistency (especially for R) is crucial. Under these conditions, having a meaningful warning would indeed be the best. Thanks for the detailed answer and for the slightly-off topic "diversion" . Cheers, Nico > Cheers, > H. > >> Is there anything that speaks again having these functions internally converting the integer values to numeric before calculating the sum or mean? >> >> Looking forward to hearing your thoughts on this, >> >> Cheers, >> >> Nico >> >> sessionInfo() >> R Under development (unstable) (2012-02-07 r58290) >> Platform: x86_64-apple-darwin10.8.0 (64-bit) >> >> locale: >> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] IRanges_1.13.24 BiocGenerics_0.1.4 >> >> loaded via a namespace (and not attached): >> [1] tools_2.15.0 >> >> >> >> --------------------------------------------------------------- >> Nicolas Delhomme >> >> Genome Biology Computational Support >> >> European Molecular Biology Laboratory >> >> Tel: +49 6221 387 8310 >> Email: nicolas.delhomme at embl.de >> Meyerhofstrasse 1 - Postfach 10.2209 >> 69102 Heidelberg, Germany >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 * Previous message: [BioC] Integer overflow when summing an 'integer' Rle * Next message: [BioC] about library size and length of gene information in DEseq * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] More information about the Bioconductor mailing list [[alternative HTML version deleted]]
Coverage GO Cancer convert IRanges Coverage GO Cancer convert IRanges • 2.5k views
ADD COMMENT
0
Entering edit mode
@delhommeemblde-3232
Last seen 9.6 years ago
Great! Thanks, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On 4 Sep 2012, at 22:16, Valerie Obenchain wrote: > Hi Nico, > > The following fixes have been applied to IRanges 1.15.43 > > (1) The 'Integer overflow' warning thrown by sum() on an integer-Rle is now more appropriate, > > library(IRanges) > x <- Rle(values=as.integer(c(1, 2^31 -1, 1))) > > sum(x) > [1] NA > Warning message: > In sum(runValue(x) * runLength(x), ..., na.rm = na.rm) : > Integer overflow - use runValue(.) <- as.numeric(runValue(.)) > > (2) integers are coerced to numeric when calling mean() on an integer-Rle > > > mean(x) > [1] 715827883 > > Valerie > > > > ## Paste of original correspondence between Nico and Herve > > [BioC] Integer overflow when summing an 'integer' Rle > Nicolas Delhomme delhomme at embl.de > Tue Feb 14 17:35:48 CET 2012 > > Salut Herv?, > > Bonne ann?e! Well, we're already mid-Feb, but still most of it is in front of us ;-) > > On 10 Feb 2012, at 19:30, Hervé Pagès wrote: > > > Salut Nico, > > > > On 02/10/2012 08:04 AM, Nicolas Delhomme wrote: > >> Hi all, > >> > >> While calculating some statistics of an RNA-seq experiment I tumbled onto the following problem. Applying the IRanges coverage function to my IRanges, I get back an integer Rle object. However trying to get the mean or sum of that Rle object results in an integer overflow. The following example just exemplify that overflow. > >> > >> library(IRanges) > >> rC<- Rle(values=as.integer(c(1,(2^31)-1,1))) > >> sum(rC) > >> mean(rC) > >> > >> Both result in an integer overflow. > >> > >> [1] NA > >> Warning message: > >> In sum(runValue(x) * runLength(x), ..., na.rm = na.rm) : > >> Integer overflow - use sum(as.numeric(.)) > >> > >> The solution to that is to do the following: > >> > >> sum(as.numeric(runLength(rC) * runValue(rC))) > > > > Another solution is to convert the 'integer' Rle into a 'numeric' Rle > > before doing sum(). Unfortunately, since we don't have separate > > classes for those (like for example an IntegerRle and a DoubleRle > > class) it cannot be done using direct coercion i.e. with something > > like: > > > > as(rC, "DoubleRle") > > > > (Maybe we should have individual Rle subclasses for 'integer' Rle, > > 'numeric' Rle, 'logical' Rle, 'character' Rle, 'factor' Rle etc...) > > > > That could be useful. I, a few times, had to do quite some conversions to go back and forth between different Rle "kinds". Having subclasses would be great. > > > So for now, this conversion must be done with: > > > > > class(runValue(rC)) <- "double" > > > rC > > 'numeric' Rle of length 3 with 3 runs > > Lengths: 1 1 1 > > Values : 1 2147483647 1 > > > > This works fine with an Rle, but not so much with an RleList where > > one needs to do some ugly contortions in order to succeed. > > Well, I ended up doing that in an lapply and it works just fine. Not the most efficient memory wise though. > > > > > Alternatively to having individual Rle subclasses maybe we could have > > an accessor e.g. rleValueType(), with getter and setters, so we could > > do: > > > > > rleValueType(rC) > > [1] "integer" > > > rleValueType(rC) <- "double" > > > > and that would work on Rle and RleList objects. > > > > That would indeed be very useful and probably easier to implement. > > > Anyway, even though I think having an easy/unified way for changing > > the type of the values in Rle/RleList objects is important, maybe > > I'm going slightly off-topic. > > > > What we should definitely do now is replace this warning: > > > > Warning message: > > In sum(runValue(x) * runLength(x), ..., na.rm = na.rm) : > > Integer overflow - use sum(as.numeric(.)) > > > > by a more appropriate one (doing as.numeric() on an Rle is not a good > > idea). > > > > Indeed. > > > >> > >> but IMO it should be handled at the Rle level code; i.e. an integer Rle can clearly have a sum, a mean, etc... result that involve calculating values outside the integer range. > > > > I agree for mean() so I'll fix that. > > > > But for sum()... "calculating values outside the integer range", > > even if the result of this calculation itself is not in the > > integer range? base::sum() will return NA if the result is not in > > the integer range and I think that's the right thing to do. > > I don't like the idea of sum() returning a double when the input > > is integer. > > > > I'm on the same page here. Consistency (especially for R) is crucial. Under these conditions, having a meaningful warning would indeed be the best. > > Thanks for the detailed answer and for the slightly-off topic "diversion" . > > Cheers, > > Nico > > > Cheers, > > H. > > > >> Is there anything that speaks again having these functions internally converting the integer values to numeric before calculating the sum or mean? > >> > >> Looking forward to hearing your thoughts on this, > >> > >> Cheers, > >> > >> Nico > >> > >> sessionInfo() > >> R Under development (unstable) (2012-02-07 r58290) > >> Platform: x86_64-apple-darwin10.8.0 (64-bit) > >> > >> locale: > >> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 > >> > >> attached base packages: > >> [1] stats graphics grDevices utils datasets methods base > >> > >> other attached packages: > >> [1] IRanges_1.13.24 BiocGenerics_0.1.4 > >> > >> loaded via a namespace (and not attached): > >> [1] tools_2.15.0 > >> > >> > >> > >> --------------------------------------------------------------- > >> Nicolas Delhomme > >> > >> Genome Biology Computational Support > >> > >> European Molecular Biology Laboratory > >> > >> Tel: +49 6221 387 8310 > >> Email: nicolas.delhomme at embl.de > >> Meyerhofstrasse 1 - Postfach 10.2209 > >> 69102 Heidelberg, Germany > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > > Hervé Pagès > > > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > > > E-mail: hpages at fhcrc.org > > Phone: (206) 667-5791 > > Fax: (206) 667-1319 > > * Previous message: [BioC] Integer overflow when summing an 'integer' Rle > * Next message: [BioC] about library size and length of gene information in DEseq > * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] > > More information about the Bioconductor mailing list > > >
ADD COMMENT

Login before adding your answer.

Traffic: 705 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6