Combining two GRangesLists
2
Entering edit mode
llevar • 20
@llevar-11831
Last seen 4.2 years ago

Hello,

 

I'm trying to combine two GRangesList objects element-wise, to produce a single GRangesList.

 

gr1 <-
  GRanges(seqnames = "chr2", ranges = IRanges(3, 6))
gr2 <-
  GRanges(c("chr1", "chr1"), ranges = IRanges(c(7,13), width = 3))
grl <- GRangesList("gr1" = gr1, "gr2" = gr2)

gr3 <-
  GRanges(seqnames = "chr2", ranges = IRanges(9, 12))
gr4 <-
  GRanges(c("chr1", "chr1"), ranges = IRanges(c(25,38), width = 3))
grl2 <- GRangesList("gr1" = gr1, "gr2" = gr2)

grl3 <- GRangesList(c(gr1, gr3), c(gr2, gr4))

 

Based on the example above I want to combine grl and grl2 to produce grl3. My real use-case has lists of 20k elements each and I would like to be able to do this as efficiently as possible. I wrote a for loop to do this via repeated calls to c() on each GRanges element and dumping them into a new list, but this takes a long time to run on real data, and seems like there should be a better way to do this.

Thanks in advance.

 

Sergei.

 

GRanges grangeslist • 3.5k views
ADD COMMENTlink
4
Entering edit mode
@michael-lawrence-3846
Last seen 5 days ago
United States

pc(grl, grl2)

ADD COMMENTlink
0
Entering edit mode

What does 'pc()' stand for? it's completely cryptic to me!

ADD REPLYlink
0
Entering edit mode

That's bad. It stands for parallel combine, like pmax() and pmin().

ADD REPLYlink
1
Entering edit mode
Mike Smith ♦ 4.8k
@mike-smith
Last seen 3 hours ago
EMBL Heidelberg / de.NBI

Do the GRangesLists you're trying to combine always have the same number of elements?  If so then you can probably use mapply() e.g.

mapply(c, grl, grl2)
$gr1
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr2   [3,  6]      *
  [2]     chr2   [9, 12]      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

$gr2
GRanges object with 4 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1  [ 7,  9]      *
  [2]     chr1  [13, 15]      *
  [3]     chr1  [25, 27]      *
  [4]     chr1  [38, 40]      *
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

No idea how well it'll scale to 20,000 entries.

ADD COMMENTlink
0
Entering edit mode
llevar • 20
@llevar-11831
Last seen 4.2 years ago

Thank you Mike and Michael for your responses. Michael's solution works in sub-second time. mapply didn't return after a few minutes on real input so I stopped it.

ADD COMMENTlink
0
Entering edit mode
@martin-morgan-1513
Last seen 3 days ago
United States

unlist() and split() are relatively fast on GRangesList / GRanges(), so

mergelists = function(x, y) {
    ux = c(unlist(x, use.names=FALSE), unlist(y, use.names=FALSE))
    f = rep(c(names(x), names(y)), c(lengths(x), lengths(y)))
    split(ux, f)
}

This differs from the pc() solution in that it respects the names of the list

> mergelists(grl[2:1], grl2)
GRangesList object of length 2:
$gr1 
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr2    [3, 6]      *
  [2]     chr2    [3, 6]      *

$gr2 
GRanges object with 4 ranges and 0 metadata columns:
      seqnames   ranges strand
  [1]     chr1 [ 7,  9]      *
  [2]     chr1 [13, 15]      *
  [3]     chr1 [ 7,  9]      *
  [4]     chr1 [13, 15]      *

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
> pc(grl[2:1], grl2)
GRangesList object of length 2:
$gr2 
GRanges object with 3 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1  [ 7,  9]      *
  [2]     chr1  [13, 15]      *
  [3]     chr2  [ 3,  6]      *

$gr1 
GRanges object with 3 ranges and 0 metadata columns:
      seqnames   ranges strand
  [1]     chr2 [ 3,  6]      *
  [2]     chr1 [ 7,  9]      *
  [3]     chr1 [13, 15]      *

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
ADD COMMENTlink
1
Entering edit mode

To do that, I think you can just use regroup():

grl3 <- c(grl, grl2)
regroup(grl3, names(grl3))
ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3