Question

setdiff for GenomicRanges

1

Entering edit mode

Benilton Carvalho ★ 4.3k

@benilton-carvalho-1375

Last seen 5.9 years ago

Brazil/Campinas/UNICAMP

Hi everyone,

I'm wondering if the following is the expected behaviour for setdiff:

gr1 <- GRanges('a', IRanges(c(1, 3), c(2, 9)))
gr2 <- GRanges('a', IRanges(20, 30))
gr3 <- GRanges('a', IRanges(c(1, 4), c(2, 9)))
diff1 <- setdiff(gr1, gr2)
diff2 <- setdiff(gr3, gr2)

My expectation was to get gr1 back, given that the intersection between gr1 and gr2 is empty. But the resulting object diff1 is reduce(gr1). Just to be clear, I expected to get something analogous to diff2.

Many thanks, benilton

genomicranges • 6.7k views

ADD COMMENT • link updated 9.4 years ago by Michael Lawrence ★ 11k • written 9.4 years ago by Benilton Carvalho ★ 4.3k

score 4 · Accepted Answer · 2016-09-14

4

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 4.2 years ago

United States

Yes, this is the expected behavior, because this is a set operation, and all set operations imply reduce(). We view the ranges as sets of integers, and sets must only contain unique elements.

To subtract all ranges in one set (B) that overlap each range in another (A), find the overlaps, group the B ranges by overlap, and use psetdiff():

hits <- findOverlaps(gr1, gr2)
grl <- extractList(gr2, as(hits, "List"))
psetdiff(gr1, grl)

But note that you get back a GRangesList, since ranges in A can be split. You'll need to think about how to deal with those.