identify non-overlapping regions
1
0
Entering edit mode
Chee Lee ▴ 10
@chee-lee-4473
Last seen 10.6 years ago
Hi all, Is there a bioconductor package that given a data frame with an 'end' and 'start' of each sequence region, I can parse it so that the output is set of regions that include only those regions that do not overlap? For example: Input: Start End 1 10 8 20 13 34 Output: Start End 1 7 11 12 20 34 I know there is IRanges, but the reduce() function reduces the input so that there are no redundant regions, which is not what I want (rather, I would like to remove those redundant regions). Any help is much appreciated! Thank you, CL [[alternative HTML version deleted]]
IRanges IRanges • 1.3k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 15 days ago
United States
Hi, On Mon, Feb 7, 2011 at 9:34 AM, Chee Lee <cheelee at="" umich.edu=""> wrote: > Hi all, > Is there a bioconductor package that given a data frame with an 'end' and > 'start' of each sequence region, I can parse it so that the output is set of > regions that include only those regions that do not overlap? ?For example: Your example didn't come through clearly at all -- next time, you can set up your data object in R, and use "dput" to paste them into an email in a way that we can then recover them in our R sessions. For instance, I think you wanted to start with an example like this: R> library(IRanges) R> i <- IRanges(c(1, 8, 13), c(10, 20, 34)) Which looks like: R> i IRanges of length 3 start end width [1] 1 10 10 [2] 8 20 13 > i IRanges of length 3 start end width [1] 1 10 10 [2] 8 20 13 [3] 13 The output of dput looks like: R> dput(i) new("IRanges" , start = c(1L, 8L, 13L) , width = c(10L, 13L, 22L) , NAMES = NULL , elementMetadata = NULL , elementType = "integer" , metadata = list() ) Which is something we can copy and paste into R to recover the original IRanges object. Anyway, you were right to start by looking at IRanges, but you were wrong to give up so soon :-) IRanges is an insanely "deep" library, so you should take time to read through its vignettes, and even look through its function list -- which you can get to via the "index" link at the bottom of any of the IRanges specific help pages. There are several ways to solve this problem, I'll show you one: R> slice(coverage(i), upper=1, rangesOnly=TRUE) IRanges of length 3 start end width [1] 1 7 7 [2] 11 12 2 [3] 21 34 14 Look at the help for ?slice, and ?coverage. Also, the "disjoin" function gets you close to what you want, as well. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENT

Login before adding your answer.

Traffic: 501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6