Integer overflow when summing an 'integer' Rle
Hi Nico, The following fixes have been applied to IRanges 1.15.43 (1) The 'Integer overflow' warning thrown by sum() on an integer-Rle is now more appropriate, library(IRanges) x <- Rle(values=as.integer(c(1, 2^31 -1, 1))) > sum(x) [1] NA Warning message: In sum(runValue(x) * runLength(x), ..., na.rm = na.rm) : Integer overflow - use runValue(.) <- as.numeric(runValue(.)) // (2) integers are coerced to numeric when calling mean() on an integer- Rle > mean(x) [1] 715827883 Valerie ## Paste of original correspondence between Nico and Herve [BioC] Integer overflow when summing an 'integer' Rle Nicolas Delhomme delhomme at embl.de Tue Feb 14 17:35:48 CET 2012 Salut Hervé, Bonne année! Well, we're already mid-Feb, but still most of it is in front of us ;-) On 10 Feb 2012, at 19:30, Hervé Pagès wrote: > Salut Nico, > > On 02/10/2012 08:04 AM, Nicolas Delhomme wrote: >> Hi all, >> >> While calculating some statistics of an RNA-seq experiment I tumbled onto the following problem. Applying the IRanges coverage function to my IRanges, I get back an integer Rle object. However trying to get the mean or sum of that Rle object results in an integer overflow. The following example just exemplify that overflow. >> >> library(IRanges) >> rC<- Rle(values=as.integer(c(1,(2^31)-1,1))) >> sum(rC) >> mean(rC) >> >> Both result in an integer overflow. >> >> [1] NA >> Warning message: >> In sum(runValue(x) * runLength(x), ..., na.rm = na.rm) : >> Integer overflow - use sum(as.numeric(.)) >> >> The solution to that is to do the following: >> >> sum(as.numeric(runLength(rC) * runValue(rC))) > > Another solution is to convert the 'integer' Rle into a 'numeric' Rle > before doing sum(). Unfortunately, since we don't have separate > classes for those (like for example an IntegerRle and a DoubleRle > class) it cannot be done using direct coercion i.e. with something > like: > > as(rC, "DoubleRle") > > (Maybe we should have individual Rle subclasses for 'integer' Rle, > 'numeric' Rle, 'logical' Rle, 'character' Rle, 'factor' Rle etc...) > That could be useful. I, a few times, had to do quite some conversions to go back and forth between different Rle "kinds". Having subclasses would be great. > So for now, this conversion must be done with: > > > class(runValue(rC)) <- "double" > > rC > 'numeric' Rle of length 3 with 3 runs > Lengths: 1 1 1 > Values : 1 2147483647 1 > > This works fine with an Rle, but not so much with an RleList where > one needs to do some ugly contortions in order to succeed. Well, I ended up doing that in an lapply and it works just fine. Not the most efficient memory wise though. > > Alternatively to having individual Rle subclasses maybe we could have > an accessor e.g. rleValueType(), with getter and setters, so we could > do: > > > rleValueType(rC) > [1] "integer" > > rleValueType(rC) <- "double" > > and that would work on Rle and RleList objects. > That would indeed be very useful and probably easier to implement. > Anyway, even though I think having an easy/unified way for changing > the type of the values in Rle/RleList objects is important, maybe > I'm going slightly off-topic. > > What we should definitely do now is replace this warning: > > Warning message: > In sum(runValue(x) * runLength(x), ..., na.rm = na.rm) : > Integer overflow - use sum(as.numeric(.)) > > by a more appropriate one (doing as.numeric() on an Rle is not a good > idea). > Indeed. >> >> but IMO it should be handled at the Rle level code; i.e. an integer Rle can clearly have a sum, a mean, etc... result that involve calculating values outside the integer range. > > I agree for mean() so I'll fix that. > > But for sum()... "calculating values outside the integer range", > even if the result of this calculation itself is not in the > integer range? base::sum() will return NA if the result is not in > the integer range and I think that's the right thing to do. > I don't like the idea of sum() returning a double when the input > is integer. > I'm on the same page here. Consistency (especially for R) is crucial. Under these conditions, having a meaningful warning would indeed be the best. Thanks for the detailed answer and for the slightly-off topic "diversion" . Cheers, Nico > Cheers, > H. > >> Is there anything that speaks again having these functions internally converting the integer values to numeric before calculating the sum or mean? >> >> Looking forward to hearing your thoughts on this, >> >> Cheers, >> >> Nico Great! 