Best way to subset one IRanges by list of IRanges
0
0
Entering edit mode
@robert-m-flight-4158
Last seen 23 months ago
United States

I have a case where I really want to generate a list of subsetted IRanges objects, where each one is the result of querying from a list of other IRanges objects.

query <- IRanges(c(1, 4, 9), c(5, 7, 10))
subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
spartition <- factor(c("a","b","c"))
slist <- split(subject, spartition)

sublist <- IRanges::subsetByOverlaps(query, slist)

# Error in (function (classes, fdef, mtable)  :
#  unable to find an inherited method for function ‘findOverlaps’ for signature ‘"IRanges", "CompressedIRangesList"’


And I would get a list of length 3, where each would contain whatever was in query that overlapped with what was in that entry of slist. Right now, when I try this code, each entry of sublist is empty (IRanges v 2.14.2), and I don't see anything in News that leads me to believe it would be any different.

So right now I'm just purrr::maping over the entries in list, but figured if it was available in IRanges itself, it would incredibly more efficient.

iranges • 692 views
0
Entering edit mode

I will also add, that if I do:

query <- IRanges(c(1, 4, 9), c(5, 7, 10))
subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
spartition <- factor(c("a","b","c"))
slist <- split(subject, spartition)
qlist <- split(query, rep(1, 3))

sublist2 <- IRanges::subsetByOverlaps(qlist, slist)


The entries will be empty, and of length 1. If I reverse it and do

sublist3 <- IRanges::subsetByOverlaps(slist, qlist)


Then it will be of length 3, but each entry is a zero-length IRanges.

0
Entering edit mode

The reason this does not exist is that a RangesList defines ranges within separate spaces (typically chromosomes), named by the names of the list. An IRanges has no defined space. We could add a method that simply repeats the search across every space, but so far there has been no motivating use case. More details on yours would help.

0
Entering edit mode

I'm using raw IRanges functions to work on raw data points within an mass spectrum. The splitting into lists is to make it easier to do bplapply and furrr::future_map operations on the subsets of raw data points. So there is definitely no separate spaces like chromosomes in this type of application. There are separate scans of data, but they share the same range space, so they naturally get lumped together.

0
Entering edit mode

Would you be interested in contributing a subsetByOverlaps() method? Btw, I think you do need to reverse the arguments from your initial example, i.e., subsetByOverlaps(slist, query). The simplest thing to do is unlist(slist), perform the subset, then reform the list, which would be the slightly tricky part to do efficiently.

0
Entering edit mode

I think the arguments are correct, at least based on how I'm currently doing it. If you think of the list wise operation, it becomes:

q_by_list = lapply(slist, function(s_sub){
subsetByOverlaps(query, s_sub)
})


is what I'm trying to achieve, where each entry in q_by_list is the the bits of the query that were in each of the entries of slist

0
Entering edit mode

Is https://github.com/Bioconductor/IRanges the right place to submit any pull requests with the above mentioned method? And it's probably best to continue this conversation there on an issue ....

0
Entering edit mode

I see, in that case I'm not sure if that is really a subset operation. It's more like extractList() except by overlap.