Question: setdiff for GenomicRanges
gravatar for Benilton Carvalho
21 months ago by
Benilton Carvalho4.2k wrote:

Hi everyone,

I'm wondering if the following is the expected behaviour for setdiff:

gr1 <- GRanges('a', IRanges(c(1, 3), c(2, 9)))
gr2 <- GRanges('a', IRanges(20, 30))
gr3 <- GRanges('a', IRanges(c(1, 4), c(2, 9)))
diff1 <- setdiff(gr1, gr2)
diff2 <- setdiff(gr3, gr2)

My expectation was to get gr1 back, given that the intersection between gr1 and gr2 is empty. But the resulting object diff1 is reduce(gr1). Just to be clear, I expected to get something analogous to diff2.

Many thanks, benilton



ADD COMMENTlink modified 21 months ago by Michael Lawrence10.0k • written 21 months ago by Benilton Carvalho4.2k
gravatar for Michael Lawrence
21 months ago by
Michael Lawrence10.0k
United States
Michael Lawrence10.0k wrote:

Yes, this is the expected behavior, because this is a set operation, and all set operations imply reduce(). We view the ranges as sets of integers, and sets must only contain unique elements.

To subtract all ranges in one set (B) that overlap each range in another (A), find the overlaps, group the B ranges by overlap, and use psetdiff():

hits <- findOverlaps(gr1, gr2)
grl <- extractList(gr2, as(hits, "List"))
psetdiff(gr1, grl)

But note that you get back a GRangesList, since ranges in A can be split. You'll need to think about how to deal with those.

ADD COMMENTlink written 21 months ago by Michael Lawrence10.0k

Many thanks! Understanding that ranges are seen as sets of integers was key!

ADD REPLYlink written 21 months ago by Benilton Carvalho4.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 301 users visited in the last hour