Most efficient way to compute width of overlap of multiple features

0

Entering edit mode

Charles Berry ▴ 290

@charles-berry-5754

Last seen 5.1 years ago

United States

Michael Lawrence <lawrence.michael at="" ...=""> writes: > Thanks for your reply and for GenomicRanges. Please see inline comments below. > > On Tue, Jan 7, 2014 at 9:30 AM, Charles Berry <ccberry at="" ...=""> wrote: > > > Vince S. Buffalo <vsbuffalo <at=""> ...> writes: > > > > > > > > Hi Bioconductor folks, > > > > > > cover2.gr <- subset( as( coverage(gr2)>0, "GRanges"), score) > > > > seqinfocover2.gr) <- seqinfo(gr2) > > > > I think you just want cover2.gr <- reduce(gr2) instead of the above two > lines. Right. I should have recalled that. > > > > hits <- findOverlaps(gr,cover2.gr) > > > gr.over <- > > + pintersect(gr[queryHits(hits)],cover2.gr[subjectHits(hits)]) > > > > w <- width(ranges(hits, ranges(gr), rangescover2.gr))) *should* work as > an alternative to the pintersect call. > It does, but I cannot imagine I would ever have figured this idiom out. Maybe add an example to one of the help pages?? > > > gr.counts <- tapply(gr.over,queryHits(hits),FUN=function(x) > > sum(width(x))) > > > > This tapply is going to kill performance. Use something like: > > gr$overlap <- sum(splitAsList(w, factor(queryHits(hits), > seq_len(queryLength(hits))))) > Awesome! AFAICS, splitAsList has no help page. (It is used but not described in the GenomicRanges HOWTOs vignette.) I find GenomicRanges is an essential toolkit these days and would like to be able to figure out idioms like these. Anything that can be done to make idioms like these easier to find and use is much appreciated. > Or even: > > gr$overlap <- sum(relist(w, as(hits, "List"))) > Ditto the last comment, I think. Chuck

GenomicRanges GenomicRanges • 825 views

ADD COMMENT • link updated 10.3 years ago by Michael Lawrence ★ 11k • written 10.3 years ago by Charles Berry ▴ 290

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 2.4 years ago

United States

I'd love to come up with a nice way of communicating these aspects. In particular, the use of lists and partitioning, which is always overlooked. Just not sure of the best way. A cookbook? A series of workflow vignettes? A book-length manuscript that explores all the nooks and crannies? Just better man pages? There's just not enough time... Michael On Tue, Jan 7, 2014 at 12:30 PM, Charles Berry <ccberry@ucsd.edu> wrote: > Michael Lawrence <lawrence.michael@...> writes: > > > > > Thanks for your reply and for GenomicRanges. > > Please see inline comments below. > > > > > On Tue, Jan 7, 2014 at 9:30 AM, Charles Berry <ccberry@...> wrote: > > > > > Vince S. Buffalo <vsbuffalo <at=""> ...> writes: > > > > > > > > > > > Hi Bioconductor folks, > > > > > > > > > cover2.gr <- subset( as( coverage(gr2)>0, "GRanges"), score) > > > > > > seqinfocover2.gr) <- seqinfo(gr2) > > > > > > > I think you just want cover2.gr <- reduce(gr2) instead of the above two > > lines. > > Right. I should have recalled that. > > > > > > hits <- findOverlaps(gr,cover2.gr) > > > > gr.over <- > > > + pintersect(gr[queryHits(hits)],cover2.gr[subjectHits(hits)]) > > > > > > > w <- width(ranges(hits, ranges(gr), rangescover2.gr))) *should* work > as > > an alternative to the pintersect call. > > > > It does, but I cannot imagine I would ever have figured this idiom out. > > Maybe add an example to one of the help pages?? > > > > > > gr.counts <- tapply(gr.over,queryHits(hits),FUN=function(x) > > > sum(width(x))) > > > > > > > This tapply is going to kill performance. Use something like: > > > > gr$overlap <- sum(splitAsList(w, factor(queryHits(hits), > > seq_len(queryLength(hits))))) > > > > Awesome! > > AFAICS, splitAsList has no help page. (It is used but not described > in the GenomicRanges HOWTOs vignette.) > > I find GenomicRanges is an essential toolkit these days and would like > to be able to figure out idioms like these. > > Anything that can be done to make idioms like these easier to find > and use is much appreciated. > > > > Or even: > > > > gr$overlap <- sum(relist(w, as(hits, "List"))) > > > > Ditto the last comment, I think. > > Chuck > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 10.3 years ago Michael Lawrence ★ 11k

Login before adding your answer.