Question: cluster (only one end overlapping) breakpoints using InteractionSet
gravatar for tangming2005
3.2 years ago by
United States
tangming2005140 wrote:

Hi there,
Thanks for this package. I have been using it to cluster my breakpoints as I took down notes here
In the post, I used your method to cluster breakpoints which have both ends overlapping.

Now I have another question:

       A                B
                         C                    D
                                                 E                     F

I have a GenomicInteraction object, among pairs, breakpointB overlaps with C, D overlaps with E. I want to group these 3 gi object together and assign the same ID to them, so I know these three are in a complex rearrangment event.

In a toy example, the first three pairs should be grouped as one

all.regions <- GRanges(rep("chrA",8), IRanges(c(1,4,5,9,10,15,20,22), c(3,6,7,11,13,19,25,27)))
index.1 <- c(1,3,5,7)
index.2 <- c(2,4,6,8) 

gi <- GInteractions(index.1, index.2, all.regions, mode ="strict")



ADD COMMENTlink modified 3.2 years ago by Aaron Lun25k • written 3.2 years ago by tangming2005140
Answer: cluster (only one end overlapping) breakpoints using InteractionSet
gravatar for Aaron Lun
3.2 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

Continuing from your example above:

olap1 <- findOverlaps(anchors(gi, "first"), gi) # overlaps with first region
olap2 <- findOverlaps(anchors(gi, "second"), gi) # overlaps with second region
olap <- unique(Hits(c(queryHits(olap1), queryHits(olap2)),
    c(subjectHits(olap1), subjectHits(olap2)),
    length(gi), length(gi), # combined overlaps

The olap object is contains all pairs of entries in gi that contain one or more overlapping anchor regions. This can then be used to construct a graph as described in C: manipulate bedpe format files, with clustering performed by identifying all connected nodes in the graph.

Note that greater efficiency can be obtained by doing the overlaps to regions(gi) and then expanding the overlaps based on the anchor IDs (i.e., using anchors with id=TRUE). This avoids expanding the GRanges when calling anchors in the findOverlaps calls above, which saves memory and time (as duplicated ranges don't have to be overlapped). However, it requires some care so I would only bother doing it for large objects where speed really mattered.

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Aaron Lun25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 341 users visited in the last hour