GenomicRanges: Take union of GRanges and keep metadata?
2
1
Entering edit mode
endrebak85 ▴ 40
@endrebak85-10660
Last seen 5.4 years ago
github.com/endrebak/

Let us say I have two GenomicRanges objects that look like this:

library(GenomicRanges)

 gr1 <- GRanges("chr1", IRanges(c(1, 11, 21), c(9, 19, 29), names=paste0("rsid:", letters[1:3])), score=c(6,7,8))

gr2 <- GRanges("chr1", IRanges(c(1,11,31), c(9,19,39), names=paste0("dnase:", letters[1:3])), score=c(0,1,5))

I want to take the union of bins, keeping the metadata, so that I end up with:

GRanges object with 4 ranges and 2 metadata columns:
          seqnames    ranges strand |     score  score
             <Rle> <IRanges>  <Rle> | <numeric>  <numeric>
  dnase:a     chr1  [ 1,  9]      * |         0  6
  dnase:b     chr1  [11, 19]      * |         1  7
  rsid:c     chr1  [21, 29]      * |          0  8
  dnase:c     chr1  [31, 39]      * |         5  0

Ps. the final names of the sequences do not matter, ideally they should have none.

Ps2. I have more than two GenomicRanges, so great if you can solve this by calling reduce.

Edit: Currently I am here:

u = union(gr1, gr2)

values(u)$gr1 = values(gr1)

But this leads to

Error in `[[<-`(`*tmp*`, name, value = <S4 object of class "DataFrame">) :
  3 elements in value to replace 4 elements

With the method I am attempting to use  I need some way to only set values on a subset of u.

This does not work for some reason: 

elementMetadata(u[start(u) == start(gr1)])$gr1 = elementMetadata(gr1)[,1]

If you do 

elementMetadata(u)$gr1 = 0

it works

However, when I try this for gr2 it does not work:

elementMetadata(u)$gr2 = 0
 elementMetadata(u[start(u) == start(gr2)])$gr2 = elementMetadata(gr2)[,1]

The result is 

GRanges object with 4 ranges and 2 metadata columns:
      seqnames    ranges strand |       gr1       gr2
         <Rle> <IRanges>  <Rle> | <numeric> <numeric>
  [1]     chr1  [ 1,  9]      * |         6         0
  [2]     chr1  [11, 19]      * |         7         0
  [3]     chr1  [21, 29]      * |         8         0
  [4]     chr1  [31, 39]      * |         0         0
GRange genomicranges • 5.8k views
ADD COMMENT
0
Entering edit mode
S4Vectors Version:            0.8.11

GenomicRanges Version:            1.22.4

(both from bioconda)

ADD REPLY
2
Entering edit mode
@martin-morgan-1513
Last seen 2 days ago
United States

I'm not sure that union() is doing what you think it is? It treats each range as a set of integers, and provides a compact way of representing the union of the set.

> union(GRanges("A:1-5"), GRanges("A:3-7"))
GRanges object with 1 range and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]        A    [1, 7]      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

So generally the mapping will be many-to-one. I'd guess that you were instead aiming for

u = unique(c(granges(gr1), granges(gr2)))

And then your approach but with match would seem to be good enough

mcols(u)[match(gr1, u), "gr1"] = gr1$score
mcols(u)[match(gr2, u), "gr2"] = gr2$score
ADD COMMENT
0
Entering edit mode

Your solution is great. The only thing that is a bit hard is to set the NA values to zero afterwards.

ADD REPLY
0
Entering edit mode
endrebak85 ▴ 40
@endrebak85-10660
Last seen 5.4 years ago
github.com/endrebak/

Classier solutions welcome, but this works:

elementMetadata(u[overlapsAny(u, gr1)])$gr1 = elementMetadata(gr1)[,1]

elementMetadata(u[overlapsAny(u, gr2)])$gr2 = elementMetadata(gr2)[,1]

GRanges object with 4 ranges and 2 metadata columns:
      seqnames    ranges strand |       gr1       gr2
         <Rle> <IRanges>  <Rle> | <numeric> <numeric>
  [1]     chr1  [ 1,  9]      * |         6         0
  [2]     chr1  [11, 19]      * |         7         1
  [3]     chr1  [21, 29]      * |         8         0
  [4]     chr1  [31, 39]      * |         0         5
ADD COMMENT

Login before adding your answer.

Traffic: 746 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6