cluster (only one end overlapping) breakpoints using InteractionSet
1
0
Entering edit mode
tangming2005 ▴ 170
@tangming2005-6754
Last seen 12 months ago
United States

Hi there,
Thanks for this package. I have been using it to cluster my breakpoints as I took down notes herehttp://crazyhottommy.blogspot.com/2016/03/breakpoints-clustering-for-structural.html
In the post, I used your method to cluster breakpoints which have both ends overlapping.

Now I have another question:

----|------|-------|------|----------------------------------------------------
A                B
---------------------|--------|----------|-------|-----------------------------
C                    D
--------------------------------------------|---------|------------|------|----
E                     F


I have a GenomicInteraction object, among pairs, breakpointB overlaps with C, D overlaps with E. I want to group these 3 gi object together and assign the same ID to them, so I know these three are in a complex rearrangment event.

In a toy example, the first three pairs should be grouped as one

library(InteractionSet)
all.regions <- GRanges(rep("chrA",8), IRanges(c(1,4,5,9,10,15,20,22), c(3,6,7,11,13,19,25,27)))
index.1 <- c(1,3,5,7)
index.2 <- c(2,4,6,8)

gi <- GInteractions(index.1, index.2, all.regions, mode ="strict")

gi

Thanks,
Ming

InteractionSet structural variants • 578 views
2
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 2 hours ago
The city by the bay

olap1 <- findOverlaps(anchors(gi, "first"), gi) # overlaps with first region
olap2 <- findOverlaps(anchors(gi, "second"), gi) # overlaps with second region
olap <- unique(Hits(c(queryHits(olap1), queryHits(olap2)),
c(subjectHits(olap1), subjectHits(olap2)),
length(gi), length(gi), sort.by.query=TRUE)) # combined overlaps


The olap object is contains all pairs of entries in gi that contain one or more overlapping anchor regions. This can then be used to construct a graph as described in C: manipulate bedpe format files, with clustering performed by identifying all connected nodes in the graph.

Note that greater efficiency can be obtained by doing the overlaps to regions(gi) and then expanding the overlaps based on the anchor IDs (i.e., using anchors with id=TRUE). This avoids expanding the GRanges when calling anchors in the findOverlaps calls above, which saves memory and time (as duplicated ranges don't have to be overlapped). However, it requires some care so I would only bother doing it for large objects where speed really mattered.