The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: Combining two GRangesLists
2
gravatar for llevar
2.3 years ago by
llevar20
llevar20 wrote:

Hello,

 

I'm trying to combine two GRangesList objects element-wise, to produce a single GRangesList.

 

gr1 <-
  GRanges(seqnames = "chr2", ranges = IRanges(3, 6))
gr2 <-
  GRanges(c("chr1", "chr1"), ranges = IRanges(c(7,13), width = 3))
grl <- GRangesList("gr1" = gr1, "gr2" = gr2)

gr3 <-
  GRanges(seqnames = "chr2", ranges = IRanges(9, 12))
gr4 <-
  GRanges(c("chr1", "chr1"), ranges = IRanges(c(25,38), width = 3))
grl2 <- GRangesList("gr1" = gr1, "gr2" = gr2)

grl3 <- GRangesList(c(gr1, gr3), c(gr2, gr4))

 

Based on the example above I want to combine grl and grl2 to produce grl3. My real use-case has lists of 20k elements each and I would like to be able to do this as efficiently as possible. I wrote a for loop to do this via repeated calls to c() on each GRanges element and dumping them into a new list, but this takes a long time to run on real data, and seems like there should be a better way to do this.

Thanks in advance.

 

Sergei.

 

granges grangeslist • 1.4k views
ADD COMMENTlink modified 2.3 years ago by Martin Morgan ♦♦ 22k • written 2.3 years ago by llevar20
Answer: Combining two GRangesLists
4
gravatar for Michael Lawrence
2.3 years ago by
United States
Michael Lawrence10k wrote:

pc(grl, grl2)

ADD COMMENTlink written 2.3 years ago by Michael Lawrence10k

What does 'pc()' stand for? it's completely cryptic to me!

ADD REPLYlink written 2.3 years ago by Martin Morgan ♦♦ 22k

That's bad. It stands for parallel combine, like pmax() and pmin().

ADD REPLYlink written 2.3 years ago by Michael Lawrence10k
Answer: Combining two GRangesLists
1
gravatar for Mike Smith
2.3 years ago by
Mike Smith3.2k
EMBL Heidelberg / de.NBI
Mike Smith3.2k wrote:

Do the GRangesLists you're trying to combine always have the same number of elements?  If so then you can probably use mapply() e.g.

mapply(c, grl, grl2)
$gr1
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr2   [3,  6]      *
  [2]     chr2   [9, 12]      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

$gr2
GRanges object with 4 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1  [ 7,  9]      *
  [2]     chr1  [13, 15]      *
  [3]     chr1  [25, 27]      *
  [4]     chr1  [38, 40]      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

No idea how well it'll scale to 20,000 entries.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Mike Smith3.2k
Answer: Combining two GRangesLists
0
gravatar for llevar
2.3 years ago by
llevar20
llevar20 wrote:

Thank you Mike and Michael for your responses. Michael's solution works in sub-second time. mapply didn't return after a few minutes on real input so I stopped it.

ADD COMMENTlink written 2.3 years ago by llevar20
Answer: Combining two GRangesLists
0
gravatar for Martin Morgan
2.3 years ago by
Martin Morgan ♦♦ 22k
United States
Martin Morgan ♦♦ 22k wrote:

unlist() and split() are relatively fast on GRangesList / GRanges(), so

mergelists = function(x, y) {
    ux = c(unlist(x, use.names=FALSE), unlist(y, use.names=FALSE))
    f = rep(c(names(x), names(y)), c(lengths(x), lengths(y)))
    split(ux, f)
}

This differs from the pc() solution in that it respects the names of the list

> mergelists(grl[2:1], grl2)
GRangesList object of length 2:
$gr1 
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr2    [3, 6]      *
  [2]     chr2    [3, 6]      *

$gr2 
GRanges object with 4 ranges and 0 metadata columns:
      seqnames   ranges strand
  [1]     chr1 [ 7,  9]      *
  [2]     chr1 [13, 15]      *
  [3]     chr1 [ 7,  9]      *
  [4]     chr1 [13, 15]      *

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
> pc(grl[2:1], grl2)
GRangesList object of length 2:
$gr2 
GRanges object with 3 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1  [ 7,  9]      *
  [2]     chr1  [13, 15]      *
  [3]     chr2  [ 3,  6]      *

$gr1 
GRanges object with 3 ranges and 0 metadata columns:
      seqnames   ranges strand
  [1]     chr2 [ 3,  6]      *
  [2]     chr1 [ 7,  9]      *
  [3]     chr1 [13, 15]      *

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Martin Morgan ♦♦ 22k
1

To do that, I think you can just use regroup():

grl3 <- c(grl, grl2)
regroup(grl3, names(grl3))
ADD REPLYlink written 2.3 years ago by Michael Lawrence10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 390 users visited in the last hour