#### The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: seeking idea for proper output representation for multiple GRanges overlapping
1
2.6 years ago by
Italy
Jurat Shahidin60 wrote:

Dear all:

I am wondering that any good output representation for overlapping  of multiple GRanges by parallel. my object is grs[[1L]] as query but need to remove all background noise features from query (where I set up threshold for doing this task) before proceeding overlap with grs[-1L]. However, I sketched possible steps that might yields result that I expected, but couldn't decide proper and nice output representation I might try for set of overlap-hit index. Possible idea is needed.

Plus, I want to avoid of  using nested lapply for parallel overlapping, but how to process one to many overlap mapping in two parallel batches?  I tried of using mapply, but it return only fist parallel mapping, while rest were dropped, I don't know why. I hope someone could point me out possible approach to go. Can anyone propose possible ideas, or approach to have more compatible solution for this?

# simulated data

foo <- GRanges(
seqnames=Rle(c("chr1", "chr2", "chr3", "chr4"), c(5, 6, 4, 3)),
ranges=IRanges(seq(1, by=9, len=18), seq(6, by=9, len=18)),
rangeName=letters[seq(1:18)], score=sample(1:25, 18, replace = FALSE));
foo$pvalue <- 10^(score(foo)/-1); bar <- GRanges( seqnames=Rle(c("chr1", "chr2", "chr3","chr4"), c(4, 7, 5, 4)), ranges=IRanges(seq(2, by=11, len=20), seq(8, by=11, len=20)), rangeName=letters[seq(1:20)], score=sample(1:25, 20, replace = FALSE)) bar$pvalue <- 10^(score(bar)/-1);
moo <- GRanges(
seqnames=Rle(c("chr1", "chr2", "chr3","chr4"), c(8, 7, 6, 4)),
ranges=IRanges(seq(4, by=11, len=25), seq(9, by=11, len=25)),
rangeName=letters[seq(1:25)], score=sample(1:25, 25, replace = FALSE))
moo\$pvalue <- 10^(score(moo)/-1);

How can I improve my solution? what's the better representation for output of multiple GRanges by parallel? I will be grateful if anyone give me possible ideas to get through this issue. Many thanks

Best regards:

JS

modified 2.3 years ago • written 2.6 years ago by Jurat Shahidin60
Answer: Iterate through set of GRanges in the list - advise is needed
4
2.6 years ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi,

A first immediate observation is that, because you're walking on the space of all possible pairs of list elements, you're actually calling findOverlaps(grs[[i]], grs[[j]]) for all i != j. Given that the overlaps between grs[[i]] and grs[[j]] are the same as those between grs[[j]] and grs[[i]] (modulo transposition of the returned Hits object), you're basically calling findOverlaps() twice too many times. So an easy improvement would be to just walk on the pairs for which i < j. Another, more visual, way to think about this is to (mentally) plot the (i, j) dots for all 1 <= i , j <= length(grs). This is a square grid. Instead of walking the entire grid (except its diagonal) as you currently do, walk the upper triangle only i.e. the dots in the square grid located above (or below) the diagonal.

I'm mentioning this only because it's a very common pattern... but it's not going to help much here.

More importantly, you're again using a nested loop to create a nested list. We all know this will be inefficient. Even if you manage to speed up your code, the returned object (nested list) is not convenient or efficient to work with. So the 1st thing to do is to choose a better representation of your result. "Better" means: a representation that is both fast/easy to produce and that allows fast/easy downstream computations.

For example, an adjacency matrix for all the ranges in grs is very fast to compute:

gr <- unlist(grs, use.names=FALSE)
hits <- findOverlaps(gr)
adjmat[as.matrix(hits)] <- TRUE

adjmat contains the same information as your nested list but in a very different form. Depending on what you're going to do downstream, it might be directly usable as-is or require some transformation. Hard to know without knowing more about what your concrete goals are. You're not saying much, except that you want to "Iterate through set of GRanges in the list". This cannot be a goal per se, you certainly have something else in mind.

Cheers,

H.