Question: Most efficient way to compute width of overlap of multiple features
0
5.6 years ago by
Charles Berry290
United States
Charles Berry290 wrote:
Michael Lawrence <lawrence.michael at="" ...=""> writes: > Thanks for your reply and for GenomicRanges. Please see inline comments below. > > On Tue, Jan 7, 2014 at 9:30 AM, Charles Berry <ccberry at="" ...=""> wrote: > > > Vince S. Buffalo <vsbuffalo <at=""> ...> writes: > > > > > > > > Hi Bioconductor folks, > > > > > > cover2.gr <- subset( as( coverage(gr2)>0, "GRanges"), score) > > > > seqinfocover2.gr) <- seqinfo(gr2) > > > > I think you just want cover2.gr <- reduce(gr2) instead of the above two > lines. Right. I should have recalled that. > > > > hits <- findOverlaps(gr,cover2.gr) > > > gr.over <- > > + pintersect(gr[queryHits(hits)],cover2.gr[subjectHits(hits)]) > > > > w <- width(ranges(hits, ranges(gr), rangescover2.gr))) *should* work as > an alternative to the pintersect call. > It does, but I cannot imagine I would ever have figured this idiom out. Maybe add an example to one of the help pages?? > > > gr.counts <- tapply(gr.over,queryHits(hits),FUN=function(x) > > sum(width(x))) > > > > This tapply is going to kill performance. Use something like: > > gr$overlap <- sum(splitAsList(w, factor(queryHits(hits), > seq_len(queryLength(hits))))) > Awesome! AFAICS, splitAsList has no help page. (It is used but not described in the GenomicRanges HOWTOs vignette.) I find GenomicRanges is an essential toolkit these days and would like to be able to figure out idioms like these. Anything that can be done to make idioms like these easier to find and use is much appreciated. > Or even: > > gr$overlap <- sum(relist(w, as(hits, "List"))) > Ditto the last comment, I think. Chuck
genomicranges • 432 views
modified 5.6 years ago by Michael Lawrence11k • written 5.6 years ago by Charles Berry290
Answer: Most efficient way to compute width of overlap of multiple features
0
5.6 years ago by
United States
Michael Lawrence11k wrote:
I'd love to come up with a nice way of communicating these aspects. In particular, the use of lists and partitioning, which is always overlooked. Just not sure of the best way. A cookbook? A series of workflow vignettes? A book-length manuscript that explores all the nooks and crannies? Just better man pages? There's just not enough time... Michael On Tue, Jan 7, 2014 at 12:30 PM, Charles Berry <ccberry@ucsd.edu> wrote: > Michael Lawrence <lawrence.michael@...> writes: > > > > > Thanks for your reply and for GenomicRanges. > > Please see inline comments below. > > > > > On Tue, Jan 7, 2014 at 9:30 AM, Charles Berry <ccberry@...> wrote: > > > > > Vince S. Buffalo <vsbuffalo <at=""> ...> writes: > > > > > > > > > > > Hi Bioconductor folks, > > > > > > > > > cover2.gr <- subset( as( coverage(gr2)>0, "GRanges"), score) > > > > > > seqinfocover2.gr) <- seqinfo(gr2) > > > > > > > I think you just want cover2.gr <- reduce(gr2) instead of the above two > > lines. > > Right. I should have recalled that. > > > > > > hits <- findOverlaps(gr,cover2.gr) > > > > gr.over <- > > > + pintersect(gr[queryHits(hits)],cover2.gr[subjectHits(hits)]) > > > > > > > w <- width(ranges(hits, ranges(gr), rangescover2.gr))) *should* work > as > > an alternative to the pintersect call. > > > > It does, but I cannot imagine I would ever have figured this idiom out. > > Maybe add an example to one of the help pages?? > > > > > > gr.counts <- tapply(gr.over,queryHits(hits),FUN=function(x) > > > sum(width(x))) > > > > > > > This tapply is going to kill performance. Use something like: > > > > gr$overlap <- sum(splitAsList(w, factor(queryHits(hits), > > seq_len(queryLength(hits))))) > > > > Awesome! > > AFAICS, splitAsList has no help page. (It is used but not described > in the GenomicRanges HOWTOs vignette.) > > I find GenomicRanges is an essential toolkit these days and would like > to be able to figure out idioms like these. > > Anything that can be done to make idioms like these easier to find > and use is much appreciated. > > > > Or even: > > > > gr$overlap <- sum(relist(w, as(hits, "List"))) > > > > Ditto the last comment, I think. > > Chuck > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]