Search
Question: How do I merge a list of GRanges?
1
gravatar for endrebak85
17 months ago by
endrebak8520
endrebak8520 wrote:

I have a list of genomicRanges that look like the following:

> l3
[[1]]
GRanges object with 5000 ranges and 4 metadata columns:
         seqnames             ranges strand   |    Island
            <Rle>          <IRanges>  <Rle>   | <integer>
     [1]     chr1     [10050, 10051]      *   |         0
     [2]     chr1     [10100, 10101]      *   |         0
     [3]     chr1     [10200, 10201]      *   |         0
     [4]     chr1     [10250, 10251]      *   |         0
     [5]     chr1     [13250, 13251]      *   |         0
     ...      ...                ...    ... ...       ...
  [4996]     chr1 [1261250, 1261251]      *   |         0
  [4997]     chr1 [1261300, 1261301]      *   |         1
  [4998]     chr1 [1261350, 1261351]      *   |         1
  [4999]     chr1 [1261400, 1261401]      *   |         1
  [5000]     chr1 [1261600, 1261601]      *   |         1
         data.animal.Exp1_12h_PolII.bed data.animal.Exp1_12h_Input.bed
                              <integer>                      <integer>
     [1]                              0                              1
     [2]                              1                              2
     [3]                              1                              1
     [4]                              1                              0
     [5]                              1                              0
     ...                            ...                            ...
  [4996]                              0                              1
  [4997]                              3                              0
  [4998]                              1                              1
  [4999]                              1                              0
  [5000]                              2                              0
         data.animal.Exp2_12h_Input.bed
                              <integer>
     [1]                              1
     [2]                              0
     [3]                              1
     [4]                              0
     [5]                              0
     ...                            ...
  [4996]                              1
  [4997]                              0
  [4998]                              0
  [4999]                              0
  [5000]                              0
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

[[2]]
GRanges object with 5000 ranges and 5 metadata columns:
         seqnames             ranges strand   |    Island
            <Rle>          <IRanges>  <Rle>   | <integer>
     [1]     chr1     [10000, 10001]      *   |         0
     [2]     chr1     [10050, 10051]      *   |         0
     [3]     chr1     [10100, 10101]      *   |         0
     [4]     chr1     [10150, 10151]      *   |         0
     [5]     chr1     [10200, 10201]      *   |         0
     ...      ...                ...    ... ...       ...
  [4996]     chr1 [1119900, 1119901]      *   |         0
  [4997]     chr1 [1120000, 1120001]      *   |         0
  [4998]     chr1 [1120100, 1120101]      *   |         0
  [4999]     chr1 [1120150, 1120151]      *   |         0
  [5000]     chr1 [1120200, 1120201]      *   |         0
         data.animal.Exp1_15h_PolII.bed data.animal.Exp2_15h_PolII.bed
                              <integer>                      <integer>
     [1]                              1                              0
     [2]                              0                              0
     [3]                              2                              0
     [4]                              1                              0
     [5]                              0                              0
     ...                            ...                            ...
  [4996]                              0                              0
  [4997]                              0                              0
  [4998]                              0                              0
  [4999]                              0                              0
  [5000]                              0                              0
         data.animal.Exp1_15h_Input.bed data.animal.Exp2_15h_Input.bed
                              <integer>                      <integer>
     [1]                              0                              2
     [2]                              1                              0
     [3]                              0                              1
     [4]                              0                              0
     [5]                              1                              1
     ...                            ...                            ...
  [4996]                              0                              1
  [4997]                              0                              1
  [4998]                              1                              0
  [4999]                              1                              0
  [5000]                              0                              1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

 

The real list will contain many more granges. Notice that there are several data columns in each file.

What I want is a GRanges object containing the union of all the ranges and the metadata. I could loop over the list, taking the union of each pair of GRanges, but the union loses the metadata.

Is there a way to merge all my GRanges into one giant GRanges, with metadata? (What I ultimately want is a dataframe of counts so that I can input it into limma.)

ADD COMMENTlink modified 17 months ago by Hervé Pagès ♦♦ 13k • written 17 months ago by endrebak8520
1

I am curious as to how you get this list of GRanges. Wouldn't it make sense to create an eSet or SummarizedExperiment for use with limma?

ADD REPLYlink written 17 months ago by Michael Lawrence9.8k

Ah, so if I have multiple granges (with different, somewhat overlapping rows) in a summarizedexperiment, limma can treat it like one matrix? Voom even? (I have very little bioconductor experience, unfortunately.)

ADD REPLYlink modified 17 months ago • written 17 months ago by endrebak8520
3
gravatar for Hervé Pagès
17 months ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi,

Merge is a pretty vague term. My understanding is that you want to concatenate all the GRanges objects in the list. In the R world, this concatenation is performed with c().One complication here is that the arguments to your call to c() are in a list. So you need to use do.call("c", list_of_arguments):

l1 <- list(1:5, 21:22, 31:34)
do.call("c", l1)  # same as c(l1[[1]], l1[[2]], l1[[3]])
# [1]  1  2  3  4  5 21 22 31 32 33 34

So with your list of GRanges objects do.call("c", l3) should do what you want.

H.

ADD COMMENTlink modified 17 months ago • written 17 months ago by Hervé Pagès ♦♦ 13k

Thanks. What I want to do is create a new granges object. If it is a concat, it is a horizontal concat. The new granges should have n * 4 metadata columns, where n is the number of granges, 4 is the number of metadata columns in each original grange.

So if I have two granges

a:

chr1 100 101 5 6

chr 200 201 7 8 

and b:

chr 150 151 0 1

chr 200 201 1 1 

I want to create a new granges object like this (a + b):

chr1 100 101 5 6 0 0

chr  150 151 0 0 0 1

chr 200 201 7 8 1 1

This should preferably work for an arbitrary number of granges objects. Perhaps this is not possible? Sorry for being unclear in my original q.

ADD REPLYlink modified 17 months ago • written 17 months ago by endrebak8520
1

I am working on a merge() method now. It is only a binary merge but it should do the trick with Reduce().

ADD REPLYlink written 17 months ago by Michael Lawrence9.8k
1

Ok, added in S4Vectors 0.11.7, Reduce(merge, l3) should work. NAs are filled into the missing cells, so if you want those as zeros, you will need to convert them explicitly.

 

ADD REPLYlink written 17 months ago by Michael Lawrence9.8k

do.call method did not work for me (the result remained a list), however, the following from "GRangesList-class {GenomicRanges}"  R documentation did:

"unlist(x, recursive = TRUE, use.names = TRUE): Concatenates the elements of x into a single GRanges object."

I'm still looking for a method to produce a union of all lines in the resulting GRanges object (so I end up with fewer but larger intervals), preferably without looping.

ADD REPLYlink written 5 months ago by lovro010

Concatenating two lists results in a list, so I guess you're seeing the expected behavior. Probably you want to provide a reproducible example of the data you are starting with (minimal, just a few lines to create an artificial data object...) and some explicit indication of what you want at the end.

?"relist,ANY,PartitioningByEnd-method" mentions relist() (typically used as unlist, transform, relist for 'vectorized' operations on ranges rather than iterative operation on elements of a list) and regroup() (form into new skeleton without intermediate operation). 'union' might be the within-GRanges operation 'reduce()' ?

 

ADD REPLYlink written 5 months ago by Martin Morgan ♦♦ 20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 153 users visited in the last hour