Question

identify non-overlapping regions

0

Entering edit mode

Chee Lee ▴ 10

@chee-lee-4473

Last seen 10.6 years ago

Hi all, Is there a bioconductor package that given a data frame with an 'end' and 'start' of each sequence region, I can parse it so that the output is set of regions that include only those regions that do not overlap? For example: Input: Start End 1 10 8 20 13 34 Output: Start End 1 7 11 12 20 34 I know there is IRanges, but the reduce() function reduces the input so that there are no redundant regions, which is not what I want (rather, I would like to remove those redundant regions). Any help is much appreciated! Thank you, CL [[alternative HTML version deleted]]

IRanges IRanges • 1.3k views

ADD COMMENT • link updated 14.2 years ago by Steve Lianoglou ★ 13k • written 14.2 years ago by Chee Lee ▴ 10

score 0 · Answer 1 · 2011-02-07

Hi, On Mon, Feb 7, 2011 at 9:34 AM, Chee Lee <cheelee at="" umich.edu=""> wrote: > Hi all, > Is there a bioconductor package that given a data frame with an 'end' and > 'start' of each sequence region, I can parse it so that the output is set of > regions that include only those regions that do not overlap? ?For example: Your example didn't come through clearly at all -- next time, you can set up your data object in R, and use "dput" to paste them into an email in a way that we can then recover them in our R sessions. For instance, I think you wanted to start with an example like this: R> library(IRanges) R> i <- IRanges(c(1, 8, 13), c(10, 20, 34)) Which looks like: R> i IRanges of length 3 start end width [1] 1 10 10 [2] 8 20 13 > i IRanges of length 3 start end width [1] 1 10 10 [2] 8 20 13 [3] 13 The output of dput looks like: R> dput(i) new("IRanges" , start = c(1L, 8L, 13L) , width = c(10L, 13L, 22L) , NAMES = NULL , elementMetadata = NULL , elementType = "integer" , metadata = list() ) Which is something we can copy and paste into R to recover the original IRanges object. Anyway, you were right to start by looking at IRanges, but you were wrong to give up so soon :-) IRanges is an insanely "deep" library, so you should take time to read through its vignettes, and even look through its function list -- which you can get to via the "index" link at the bottom of any of the IRanges specific help pages. There are several ways to solve this problem, I'll show you one: R> slice(coverage(i), upper=1, rangesOnly=TRUE) IRanges of length 3 start end width [1] 1 7 7 [2] 11 12 2 [3] 21 34 14 Look at the help for ?slice, and ?coverage. Also, the "disjoin" function gets you close to what you want, as well. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact