Search
Question: IRanges: Need help speeding up sliding window analysis
0
8.1 years ago by
Michael Dondrup550 wrote:
Hi, I need some help with speeding up a sliding window analysis on an Rle object of length > 1 million. I am using functions 'successiveViews' with negative gap width and 'viewApply' vs. viewMeans. My goal is to apply a discrete differential operator that computes the difference between the 'left' half of the window and the 'right', aka. a cheap discrete numeric first order differentiation. What I found is: viewMeans(x) << viewApply(x, mean) << viewApply(x, diff.op) in terms of time, example below. Is there a way to pimp this code to make it work on the genome scale? I appreciate your input, I am confident there is a better way to do it. Thank you very much Michael Code example: diff.op <- function(x, lrprop=1/2) { len = length(x) i = ceiling(len*lrprop) (sum(x[i:len]) - sum(x[1:i])) / len } sliding.window.apply <- function(object, width, fun, ...) { x <- trim(successiveViews(subject=object, width=rep(width, ceiling(length(object)) ), gap=-width+1)) return (Rle(viewApply(x, fun, ...))) } sliding.window.mean <- function(object, width) { x <- trim(successiveViews(subject=object, width=rep(width, ceiling(length(object)) ), gap=-width+1)) return (viewMeans(x)) } rle <- Rle(1:10000) > system.time(sliding.window.mean(rle, 30)) user system elapsed 0.036 0.004 0.098 > system.time(sliding.window.apply(rle, 30, mean)) user system elapsed 4.380 0.065 6.010 > system.time(sliding.window.apply(rle, 30, diff.op)) user system elapsed 38.857 0.204 39.127 > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] IRanges_1.6.6 loaded via a namespace (and not attached): [1] annotate_1.26.1 AnnotationDbi_1.10.2 Biobase_2.8.0 DBI_0.2-5 DESeq_1.0.4 [6] genefilter_1.30.0 geneplotter_1.26.0 grid_2.11.1 RColorBrewer_1.0-2 RSQLite_0.9-1 [11] splines_2.11.1 survival_2.35-8 xtable_1.5-6
modified 8.1 years ago by Martin Morgan ♦♦ 22k • written 8.1 years ago by Michael Dondrup550
0
8.1 years ago by
Martin Morgan ♦♦ 22k
United States
Martin Morgan ♦♦ 22k wrote:
On 09/24/2010 03:11 AM, Michael Dondrup wrote: > Hi, > > I need some help with speeding up a sliding window analysis on an Rle object of length > 1 million. > I am using functions 'successiveViews' with negative gap width and 'viewApply' vs. viewMeans. > My goal is to apply a discrete differential operator that computes the difference between the 'left' > half of the window and the 'right', aka. a cheap discrete numeric first order differentiation. > > What I found is: viewMeans(x) << viewApply(x, mean) << viewApply(x, diff.op) in terms of time, example below. > Is there a way to pimp this code to make it work on the genome scale? I appreciate your input, I am confident there > is a better way to do it. maybe win <- 30 diff(cumsum(rle), win) / win for numeric (not integer) rle, though there might be rounding problems if cumsum gets large. A strategy might be to break the Rle into regions separated by islands of at least 'win' 0's (using runLength / runValue to identify candidate break points), which allows one to reset the cumsum. Some inspiration might come from http://www.mail-archive.com/r-help at r-project.org/msg75280.html Also the end points might need fiddling (e.g., by padding rle with 'win' trailing zeros, which is I think in effect what successiveViews does. Martin > > Thank you very much > Michael > > Code example: > > diff.op <- function(x, lrprop=1/2) { > len = length(x) > i = ceiling(len*lrprop) > (sum(x[i:len]) - sum(x[1:i])) / len > } > > sliding.window.apply <- function(object, width, fun, ...) { > x <- trim(successiveViews(subject=object, width=rep(width, ceiling(length(object)) ), gap=-width+1)) > return (Rle(viewApply(x, fun, ...))) > } > > sliding.window.mean <- function(object, width) { > x <- trim(successiveViews(subject=object, width=rep(width, ceiling(length(object)) ), gap=-width+1)) > return (viewMeans(x)) > } > > rle <- Rle(1:10000) > >> system.time(sliding.window.mean(rle, 30)) > user system elapsed > 0.036 0.004 0.098 >> system.time(sliding.window.apply(rle, 30, mean)) > user system elapsed > 4.380 0.065 6.010 >> system.time(sliding.window.apply(rle, 30, diff.op)) > user system elapsed > 38.857 0.204 39.127 > >> sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] IRanges_1.6.6 > > loaded via a namespace (and not attached): > [1] annotate_1.26.1 AnnotationDbi_1.10.2 Biobase_2.8.0 DBI_0.2-5 DESeq_1.0.4 > [6] genefilter_1.30.0 geneplotter_1.26.0 grid_2.11.1 RColorBrewer_1.0-2 RSQLite_0.9-1 > [11] splines_2.11.1 survival_2.35-8 xtable_1.5-6 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor