Search
Question: IRanges: Need help speeding up sliding window analysis
0
gravatar for Michael Dondrup
7.2 years ago by
Michael Dondrup550 wrote:
Hi, I need some help with speeding up a sliding window analysis on an Rle object of length > 1 million. I am using functions 'successiveViews' with negative gap width and 'viewApply' vs. viewMeans. My goal is to apply a discrete differential operator that computes the difference between the 'left' half of the window and the 'right', aka. a cheap discrete numeric first order differentiation. What I found is: viewMeans(x) << viewApply(x, mean) << viewApply(x, diff.op) in terms of time, example below. Is there a way to pimp this code to make it work on the genome scale? I appreciate your input, I am confident there is a better way to do it. Thank you very much Michael Code example: diff.op <- function(x, lrprop=1/2) { len = length(x) i = ceiling(len*lrprop) (sum(x[i:len]) - sum(x[1:i])) / len } sliding.window.apply <- function(object, width, fun, ...) { x <- trim(successiveViews(subject=object, width=rep(width, ceiling(length(object)) ), gap=-width+1)) return (Rle(viewApply(x, fun, ...))) } sliding.window.mean <- function(object, width) { x <- trim(successiveViews(subject=object, width=rep(width, ceiling(length(object)) ), gap=-width+1)) return (viewMeans(x)) } rle <- Rle(1:10000) > system.time(sliding.window.mean(rle, 30)) user system elapsed 0.036 0.004 0.098 > system.time(sliding.window.apply(rle, 30, mean)) user system elapsed 4.380 0.065 6.010 > system.time(sliding.window.apply(rle, 30, diff.op)) user system elapsed 38.857 0.204 39.127 > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] IRanges_1.6.6 loaded via a namespace (and not attached): [1] annotate_1.26.1 AnnotationDbi_1.10.2 Biobase_2.8.0 DBI_0.2-5 DESeq_1.0.4 [6] genefilter_1.30.0 geneplotter_1.26.0 grid_2.11.1 RColorBrewer_1.0-2 RSQLite_0.9-1 [11] splines_2.11.1 survival_2.35-8 xtable_1.5-6
ADD COMMENTlink modified 7.2 years ago by Martin Morgan ♦♦ 20k • written 7.2 years ago by Michael Dondrup550
0
gravatar for Martin Morgan
7.2 years ago by
Martin Morgan ♦♦ 20k
United States
Martin Morgan ♦♦ 20k wrote:
On 09/24/2010 03:11 AM, Michael Dondrup wrote: > Hi, > > I need some help with speeding up a sliding window analysis on an Rle object of length > 1 million. > I am using functions 'successiveViews' with negative gap width and 'viewApply' vs. viewMeans. > My goal is to apply a discrete differential operator that computes the difference between the 'left' > half of the window and the 'right', aka. a cheap discrete numeric first order differentiation. > > What I found is: viewMeans(x) << viewApply(x, mean) << viewApply(x, diff.op) in terms of time, example below. > Is there a way to pimp this code to make it work on the genome scale? I appreciate your input, I am confident there > is a better way to do it. maybe win <- 30 diff(cumsum(rle), win) / win for numeric (not integer) rle, though there might be rounding problems if cumsum gets large. A strategy might be to break the Rle into regions separated by islands of at least 'win' 0's (using runLength / runValue to identify candidate break points), which allows one to reset the cumsum. Some inspiration might come from http://www.mail-archive.com/r-help at r-project.org/msg75280.html Also the end points might need fiddling (e.g., by padding rle with 'win' trailing zeros, which is I think in effect what successiveViews does. Martin > > Thank you very much > Michael > > Code example: > > diff.op <- function(x, lrprop=1/2) { > len = length(x) > i = ceiling(len*lrprop) > (sum(x[i:len]) - sum(x[1:i])) / len > } > > sliding.window.apply <- function(object, width, fun, ...) { > x <- trim(successiveViews(subject=object, width=rep(width, ceiling(length(object)) ), gap=-width+1)) > return (Rle(viewApply(x, fun, ...))) > } > > sliding.window.mean <- function(object, width) { > x <- trim(successiveViews(subject=object, width=rep(width, ceiling(length(object)) ), gap=-width+1)) > return (viewMeans(x)) > } > > rle <- Rle(1:10000) > >> system.time(sliding.window.mean(rle, 30)) > user system elapsed > 0.036 0.004 0.098 >> system.time(sliding.window.apply(rle, 30, mean)) > user system elapsed > 4.380 0.065 6.010 >> system.time(sliding.window.apply(rle, 30, diff.op)) > user system elapsed > 38.857 0.204 39.127 > >> sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] IRanges_1.6.6 > > loaded via a namespace (and not attached): > [1] annotate_1.26.1 AnnotationDbi_1.10.2 Biobase_2.8.0 DBI_0.2-5 DESeq_1.0.4 > [6] genefilter_1.30.0 geneplotter_1.26.0 grid_2.11.1 RColorBrewer_1.0-2 RSQLite_0.9-1 > [11] splines_2.11.1 survival_2.35-8 xtable_1.5-6 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENTlink written 7.2 years ago by Martin Morgan ♦♦ 20k
Michael, The IRanges package contains a number of built-in running window functions (runsum, runmean, runwtsum, runq) that you might want to consider for this operation. Also, if I am understanding what you are trying to do correctly, something like width <- 30 halfWidth <- width/2 halfWidthSums <- runsum(rle, halfWidth) diff(halfWidthSums, halfWidth)/width should fit the bill. Cheers, Patrick Quoting Martin Morgan <mtmorgan at="" fhcrc.org="">: > On 09/24/2010 03:11 AM, Michael Dondrup wrote: >> Hi, >> >> I need some help with speeding up a sliding window analysis on an >> Rle object of length > 1 million. >> I am using functions 'successiveViews' with negative gap width and >> 'viewApply' vs. viewMeans. >> My goal is to apply a discrete differential operator that computes >> the difference between the 'left' >> half of the window and the 'right', aka. a cheap discrete numeric >> first order differentiation. >> >> What I found is: viewMeans(x) << viewApply(x, mean) << viewApply(x, >> diff.op) in terms of time, example below. >> Is there a way to pimp this code to make it work on the genome >> scale? I appreciate your input, I am confident there >> is a better way to do it. > > maybe > > win <- 30 > diff(cumsum(rle), win) / win > > for numeric (not integer) rle, though there might be rounding problems > if cumsum gets large. A strategy might be to break the Rle into regions > separated by islands of at least 'win' 0's (using runLength / runValue > to identify candidate break points), which allows one to reset the > cumsum. Some inspiration might come from > http://www.mail-archive.com/r-help at r-project.org/msg75280.html > > Also the end points might need fiddling (e.g., by padding rle with 'win' > trailing zeros, which is I think in effect what successiveViews does. > > Martin > >> >> Thank you very much >> Michael >> >> Code example: >> >> diff.op <- function(x, lrprop=1/2) { >> len = length(x) >> i = ceiling(len*lrprop) >> (sum(x[i:len]) - sum(x[1:i])) / len >> } >> >> sliding.window.apply <- function(object, width, fun, ...) { >> x <- trim(successiveViews(subject=object, width=rep(width, >> ceiling(length(object)) ), gap=-width+1)) >> return (Rle(viewApply(x, fun, ...))) >> } >> >> sliding.window.mean <- function(object, width) { >> x <- trim(successiveViews(subject=object, width=rep(width, >> ceiling(length(object)) ), gap=-width+1)) >> return (viewMeans(x)) >> } >> >> rle <- Rle(1:10000) >> >>> system.time(sliding.window.mean(rle, 30)) >> user system elapsed >> 0.036 0.004 0.098 >>> system.time(sliding.window.apply(rle, 30, mean)) >> user system elapsed >> 4.380 0.065 6.010 >>> system.time(sliding.window.apply(rle, 30, diff.op)) >> user system elapsed >> 38.857 0.204 39.127 >> >>> sessionInfo() >> R version 2.11.1 (2010-05-31) >> x86_64-apple-darwin9.8.0 >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] IRanges_1.6.6 >> >> loaded via a namespace (and not attached): >> [1] annotate_1.26.1 AnnotationDbi_1.10.2 Biobase_2.8.0 >> DBI_0.2-5 DESeq_1.0.4 >> [6] genefilter_1.30.0 geneplotter_1.26.0 grid_2.11.1 >> RColorBrewer_1.0-2 RSQLite_0.9-1 >> [11] splines_2.11.1 survival_2.35-8 xtable_1.5-6 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLYlink written 7.2 years ago by Patrick Aboyoun1.6k
Hi Patrick, just wanted to say thank you, that was exactly what I needed and it's much faster and more robust than my initial solution. It can be used f.e. to detect locations of strong changes in a numeric Rle vector, eg. a coverage, I didn't know that 'diff' could be used on Rle's like this. Best Michael Am Sep 24, 2010 um 5:51 PM schrieb Patrick Aboyoun: > Michael, > The IRanges package contains a number of built-in running window functions (runsum, runmean, runwtsum, runq) that you might want to consider for this operation. Also, if I am understanding what you are trying to do correctly, something like > > width <- 30 > halfWidth <- width/2 > halfWidthSums <- runsum(rle, halfWidth) > diff(halfWidthSums, halfWidth)/width > > should fit the bill. > > > Cheers, > Patrick > > > Quoting Martin Morgan <mtmorgan at="" fhcrc.org="">: > >> On 09/24/2010 03:11 AM, Michael Dondrup wrote: >>> Hi, >>> >>> I need some help with speeding up a sliding window analysis on an Rle object of length > 1 million. >>> I am using functions 'successiveViews' with negative gap width and 'viewApply' vs. viewMeans. >>> My goal is to apply a discrete differential operator that computes the difference between the 'left' >>> half of the window and the 'right', aka. a cheap discrete numeric first order differentiation. >>> >>> What I found is: viewMeans(x) << viewApply(x, mean) << viewApply(x, diff.op) in terms of time, example below. >>> Is there a way to pimp this code to make it work on the genome scale? I appreciate your input, I am confident there >>> is a better way to do it. >> >> maybe >> >> win <- 30 >> diff(cumsum(rle), win) / win >> >> for numeric (not integer) rle, though there might be rounding problems >> if cumsum gets large. A strategy might be to break the Rle into regions >> separated by islands of at least 'win' 0's (using runLength / runValue >> to identify candidate break points), which allows one to reset the >> cumsum. Some inspiration might come from >> http://www.mail-archive.com/r-help at r-project.org/msg75280.html >> >> Also the end points might need fiddling (e.g., by padding rle with 'win' >> trailing zeros, which is I think in effect what successiveViews does. >> >> Martin >> >>> >>> Thank you very much >>> Michael >>> >>> Code example: >>> >>> diff.op <- function(x, lrprop=1/2) { >>> len = length(x) >>> i = ceiling(len*lrprop) >>> (sum(x[i:len]) - sum(x[1:i])) / len >>> } >>> >>> sliding.window.apply <- function(object, width, fun, ...) { >>> x <- trim(successiveViews(subject=object, width=rep(width, ceiling(length(object)) ), gap=-width+1)) >>> return (Rle(viewApply(x, fun, ...))) >>> } >>> >>> sliding.window.mean <- function(object, width) { >>> x <- trim(successiveViews(subject=object, width=rep(width, ceiling(length(object)) ), gap=-width+1)) >>> return (viewMeans(x)) >>> } >>> >>> rle <- Rle(1:10000) >>> >>>> system.time(sliding.window.mean(rle, 30)) >>> user system elapsed >>> 0.036 0.004 0.098 >>>> system.time(sliding.window.apply(rle, 30, mean)) >>> user system elapsed >>> 4.380 0.065 6.010 >>>> system.time(sliding.window.apply(rle, 30, diff.op)) >>> user system elapsed >>> 38.857 0.204 39.127 >>> >>>> sessionInfo() >>> R version 2.11.1 (2010-05-31) >>> x86_64-apple-darwin9.8.0 >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] IRanges_1.6.6 >>> >>> loaded via a namespace (and not attached): >>> [1] annotate_1.26.1 AnnotationDbi_1.10.2 Biobase_2.8.0 DBI_0.2-5 DESeq_1.0.4 >>> [6] genefilter_1.30.0 geneplotter_1.26.0 grid_2.11.1 RColorBrewer_1.0-2 RSQLite_0.9-1 >>> [11] splines_2.11.1 survival_2.35-8 xtable_1.5-6 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > >
ADD REPLYlink written 7.1 years ago by Michael Dondrup550
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 162 users visited in the last hour