How do I merge a list of GRanges?
2
Entering edit mode
endrebak85 • 30
@endrebak85-10660
Last seen 17 months ago
github.com/endrebak/

I have a list of genomicRanges that look like the following:

> l3
[[1]]
GRanges object with 5000 ranges and 4 metadata columns:
         seqnames             ranges strand   |    Island
            <Rle>          <IRanges>  <Rle>   | <integer>
     [1]     chr1     [10050, 10051]      *   |         0
     [2]     chr1     [10100, 10101]      *   |         0
     [3]     chr1     [10200, 10201]      *   |         0
     [4]     chr1     [10250, 10251]      *   |         0
     [5]     chr1     [13250, 13251]      *   |         0
     ...      ...                ...    ... ...       ...
  [4996]     chr1 [1261250, 1261251]      *   |         0
  [4997]     chr1 [1261300, 1261301]      *   |         1
  [4998]     chr1 [1261350, 1261351]      *   |         1
  [4999]     chr1 [1261400, 1261401]      *   |         1
  [5000]     chr1 [1261600, 1261601]      *   |         1
         data.animal.Exp1_12h_PolII.bed data.animal.Exp1_12h_Input.bed
                              <integer>                      <integer>
     [1]                              0                              1
     [2]                              1                              2
     [3]                              1                              1
     [4]                              1                              0
     [5]                              1                              0
     ...                            ...                            ...
  [4996]                              0                              1
  [4997]                              3                              0
  [4998]                              1                              1
  [4999]                              1                              0
  [5000]                              2                              0
         data.animal.Exp2_12h_Input.bed
                              <integer>
     [1]                              1
     [2]                              0
     [3]                              1
     [4]                              0
     [5]                              0
     ...                            ...
  [4996]                              1
  [4997]                              0
  [4998]                              0
  [4999]                              0
  [5000]                              0
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

[[2]]
GRanges object with 5000 ranges and 5 metadata columns:
         seqnames             ranges strand   |    Island
            <Rle>          <IRanges>  <Rle>   | <integer>
     [1]     chr1     [10000, 10001]      *   |         0
     [2]     chr1     [10050, 10051]      *   |         0
     [3]     chr1     [10100, 10101]      *   |         0
     [4]     chr1     [10150, 10151]      *   |         0
     [5]     chr1     [10200, 10201]      *   |         0
     ...      ...                ...    ... ...       ...
  [4996]     chr1 [1119900, 1119901]      *   |         0
  [4997]     chr1 [1120000, 1120001]      *   |         0
  [4998]     chr1 [1120100, 1120101]      *   |         0
  [4999]     chr1 [1120150, 1120151]      *   |         0
  [5000]     chr1 [1120200, 1120201]      *   |         0
         data.animal.Exp1_15h_PolII.bed data.animal.Exp2_15h_PolII.bed
                              <integer>                      <integer>
     [1]                              1                              0
     [2]                              0                              0
     [3]                              2                              0
     [4]                              1                              0
     [5]                              0                              0
     ...                            ...                            ...
  [4996]                              0                              0
  [4997]                              0                              0
  [4998]                              0                              0
  [4999]                              0                              0
  [5000]                              0                              0
         data.animal.Exp1_15h_Input.bed data.animal.Exp2_15h_Input.bed
                              <integer>                      <integer>
     [1]                              0                              2
     [2]                              1                              0
     [3]                              0                              1
     [4]                              0                              0
     [5]                              1                              1
     ...                            ...                            ...
  [4996]                              0                              1
  [4997]                              0                              1
  [4998]                              1                              0
  [4999]                              1                              0
  [5000]                              0                              1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

 

The real list will contain many more granges. Notice that there are several data columns in each file.

What I want is a GRanges object containing the union of all the ranges and the metadata. I could loop over the list, taking the union of each pair of GRanges, but the union loses the metadata.

Is there a way to merge all my GRanges into one giant GRanges, with metadata? (What I ultimately want is a dataframe of counts so that I can input it into limma.)

GenomicRanges • 9.2k views
ADD COMMENTlink
1
Entering edit mode

I am curious as to how you get this list of GRanges. Wouldn't it make sense to create an eSet or SummarizedExperiment for use with limma?

ADD REPLYlink
0
Entering edit mode

Ah, so if I have multiple granges (with different, somewhat overlapping rows) in a summarizedexperiment, limma can treat it like one matrix? Voom even? (I have very little bioconductor experience, unfortunately.)

ADD REPLYlink
5
Entering edit mode
@herve-pages-1542
Last seen 16 hours ago
Seattle, WA, United States

Hi,

Merge is a pretty vague term. My understanding is that you want to concatenate all the GRanges objects in the list. In the R world, this concatenation is performed with c().One complication here is that the arguments to your call to c() are in a list. So you need to use do.call("c", list_of_arguments):

l1 <- list(1:5, 21:22, 31:34)
do.call("c", l1)  # same as c(l1[[1]], l1[[2]], l1[[3]])
# [1]  1  2  3  4  5 21 22 31 32 33 34

So with your list of GRanges objects do.call("c", l3) should do what you want.

H.

ADD COMMENTlink
0
Entering edit mode

Thanks. What I want to do is create a new granges object. If it is a concat, it is a horizontal concat. The new granges should have n * 4 metadata columns, where n is the number of granges, 4 is the number of metadata columns in each original grange.

So if I have two granges

a:

chr1 100 101 5 6

chr 200 201 7 8 

and b:

chr 150 151 0 1

chr 200 201 1 1 

I want to create a new granges object like this (a + b):

chr1 100 101 5 6 0 0

chr  150 151 0 0 0 1

chr 200 201 7 8 1 1

This should preferably work for an arbitrary number of granges objects. Perhaps this is not possible? Sorry for being unclear in my original q.

ADD REPLYlink
1
Entering edit mode

I am working on a merge() method now. It is only a binary merge but it should do the trick with Reduce().

ADD REPLYlink
3
Entering edit mode

Ok, added in S4Vectors 0.11.7, Reduce(merge, l3) should work. NAs are filled into the missing cells, so if you want those as zeros, you will need to convert them explicitly.

 

ADD REPLYlink
0
Entering edit mode

do.call method did not work for me (the result remained a list), however, the following from "GRangesList-class {GenomicRanges}"  R documentation did:

"unlist(x, recursive = TRUE, use.names = TRUE): Concatenates the elements of x into a single GRanges object."

I'm still looking for a method to produce a union of all lines in the resulting GRanges object (so I end up with fewer but larger intervals), preferably without looping.

ADD REPLYlink
0
Entering edit mode

Concatenating two lists results in a list, so I guess you're seeing the expected behavior. Probably you want to provide a reproducible example of the data you are starting with (minimal, just a few lines to create an artificial data object...) and some explicit indication of what you want at the end.

?"relist,ANY,PartitioningByEnd-method" mentions relist() (typically used as unlist, transform, relist for 'vectorized' operations on ranges rather than iterative operation on elements of a list) and regroup() (form into new skeleton without intermediate operation). 'union' might be the within-GRanges operation 'reduce()' ?

 

ADD REPLYlink
1
Entering edit mode
mmfansler • 30
@mmfansler-13248
Last seen 6 weeks ago
MSKCC | New York, NY

I suspect this may have changed, but with Bioconductor 3.12 packages, I can do

unlist(as(l3, "GRangesList))

which converts the base list object to a GRangesList and then uses the GRL-specific unlist to merge to single GRanges object.

ADD COMMENTlink
0
Entering edit mode

This was the only solution that worked for me, thanks! But don't forget closing parentheses :P

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3