Combining two GRangesLists
4
2
Entering edit mode
llevar ▴ 20
@llevar-11831
Last seen 8.0 years ago

Hello,

 

I'm trying to combine two GRangesList objects element-wise, to produce a single GRangesList.

 

gr1 <-
  GRanges(seqnames = "chr2", ranges = IRanges(3, 6))
gr2 <-
  GRanges(c("chr1", "chr1"), ranges = IRanges(c(7,13), width = 3))
grl <- GRangesList("gr1" = gr1, "gr2" = gr2)

gr3 <-
  GRanges(seqnames = "chr2", ranges = IRanges(9, 12))
gr4 <-
  GRanges(c("chr1", "chr1"), ranges = IRanges(c(25,38), width = 3))
grl2 <- GRangesList("gr1" = gr1, "gr2" = gr2)

grl3 <- GRangesList(c(gr1, gr3), c(gr2, gr4))

 

Based on the example above I want to combine grl and grl2 to produce grl3. My real use-case has lists of 20k elements each and I would like to be able to do this as efficiently as possible. I wrote a for loop to do this via repeated calls to c() on each GRanges element and dumping them into a new list, but this takes a long time to run on real data, and seems like there should be a better way to do this.

Thanks in advance.

 

Sergei.

 

GRanges grangeslist • 8.6k views
ADD COMMENT
4
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

pc(grl, grl2)

ADD COMMENT
0
Entering edit mode

What does 'pc()' stand for? it's completely cryptic to me!

ADD REPLY
0
Entering edit mode

That's bad. It stands for parallel combine, like pmax() and pmin().

ADD REPLY
1
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 2 hours ago
EMBL Heidelberg

Do the GRangesLists you're trying to combine always have the same number of elements?  If so then you can probably use mapply() e.g.

mapply(c, grl, grl2)
$gr1
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr2   [3,  6]      *
  [2]     chr2   [9, 12]      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

$gr2
GRanges object with 4 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1  [ 7,  9]      *
  [2]     chr1  [13, 15]      *
  [3]     chr1  [25, 27]      *
  [4]     chr1  [38, 40]      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

No idea how well it'll scale to 20,000 entries.

ADD COMMENT
0
Entering edit mode
llevar ▴ 20
@llevar-11831
Last seen 8.0 years ago

Thank you Mike and Michael for your responses. Michael's solution works in sub-second time. mapply didn't return after a few minutes on real input so I stopped it.

ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States

unlist() and split() are relatively fast on GRangesList / GRanges(), so

mergelists = function(x, y) {
    ux = c(unlist(x, use.names=FALSE), unlist(y, use.names=FALSE))
    f = rep(c(names(x), names(y)), c(lengths(x), lengths(y)))
    split(ux, f)
}

This differs from the pc() solution in that it respects the names of the list

> mergelists(grl[2:1], grl2)
GRangesList object of length 2:
$gr1 
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr2    [3, 6]      *
  [2]     chr2    [3, 6]      *

$gr2 
GRanges object with 4 ranges and 0 metadata columns:
      seqnames   ranges strand
  [1]     chr1 [ 7,  9]      *
  [2]     chr1 [13, 15]      *
  [3]     chr1 [ 7,  9]      *
  [4]     chr1 [13, 15]      *

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
> pc(grl[2:1], grl2)
GRangesList object of length 2:
$gr2 
GRanges object with 3 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1  [ 7,  9]      *
  [2]     chr1  [13, 15]      *
  [3]     chr2  [ 3,  6]      *

$gr1 
GRanges object with 3 ranges and 0 metadata columns:
      seqnames   ranges strand
  [1]     chr2 [ 3,  6]      *
  [2]     chr1 [ 7,  9]      *
  [3]     chr1 [13, 15]      *

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
ADD COMMENT
1
Entering edit mode

To do that, I think you can just use regroup():

grl3 <- c(grl, grl2)
regroup(grl3, names(grl3))
ADD REPLY

Login before adding your answer.

Traffic: 551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6