How to get the intersect and get the difference of a grange object while keeping meta data
2
1
Entering edit mode
zlskidmore ▴ 30
@zlskidmore-9639
Last seen 5 months ago
United States

Hi All, I was wondering if anyone knew a way to get both the intersect and difference of a grange object while expanding out metadata columns. In other words if I had a grange object like this:

gr1 <- GRanges(seqnames=c(1), ranges=IRanges(start=c(1), end=c(5)), strand=c("*"), mcols=c("a"))

GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand |       mcols
         <Rle> <IRanges>  <Rle> | <character>
  [1]        1    [1, 5]      * |           a
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

 

and this:

 

gr2 <- GRanges(seqnames=c(1), ranges=IRanges(start=c(1), end=c(10)), strand=c("*"), mcols=c("b"))

GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand |       mcols
         <Rle> <IRanges>  <Rle> | <character>
  [1]        1   [1, 10]      * |           b
  -------

  seqinfo: 1 sequence from an unspecified genome; no seqlengths

I would expect output like this:

GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |       mcols
         <Rle> <IRanges>  <Rle> | <character>
  [1]        1   [1,  5]      * |           a
  [2]        1   [1,  5]      * |           b
  [3]        1   [5, 10]      * |           b
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

 

genomicranges intersect • 4.7k views
ADD COMMENT
3
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States

Hi,

Maybe you want to consider using disjoin() for that:

library(GenomicRanges)
x <- GRanges("chr1:8-15", score=0.11)
y <- GRanges("chr1:11-20", score=0.22)
xy <- c(x, y)
d <- disjoin(xy)
d
# GRanges object with 3 ranges and 0 metadata columns:
#       seqnames    ranges strand
#          <Rle> <IRanges>  <Rle>
#   [1]     chr1  [ 8, 10]      *
#   [2]     chr1  [11, 15]      *
#   [3]     chr1  [16, 20]      *

See ?disjoin for more information.

disjoin() (like reduce() and the inter range transformations in general) doesn't propagate the metadata columns. That's because a range in the output can be mapped to more than 1 range in the input (one-to-many mapping) so there is no canonical/obvious way to do this. Note that reduce() has a with.revmap argument that allows the user to obtain this mapping:

z <- GRanges("chr1:1-5", score=0.33)
xyz <- c(x, y, z)
r <- reduce(xyz, with.revmap=TRUE)
r
# GRanges object with 2 ranges and 1 metadata column:
#       seqnames    ranges strand |        revmap
#          <Rle> <IRanges>  <Rle> | <IntegerList>
#   [1]     chr1   [1,  5]      * |             3
#   [2]     chr1   [8, 20]      * |           1,2

Metadata column revmap describes the mapping from the output ranges to the input ranges. The user can use it to propagate the metadata columns:

revmap <- mcols(r)$revmap
r_scores <- extractList(mcols(xyz)$score, revmap)
r_scores
# NumericList of length 2
# [[1]] 0.33
# [[2]] 0.11 0.22

r_scores is a list-like object parallel to the output (i.e. the i-th list-element in r_scores corresponds to the i-th range in the output). You can stick it on the output:

mcols(r)$score <- r_scores
r
# GRanges object with 2 ranges and 2 metadata columns:
#       seqnames    ranges strand |        revmap         score
#          <Rle> <IRanges>  <Rle> | <IntegerList> <NumericList>
#   [1]     chr1   [1,  5]      * |             3          0.33
#   [2]     chr1   [8, 20]      * |           1,2     0.11,0.22

But you can also summarize it first:

mcols(r)$score <- mean(r_scores)
r
# GRanges object with 2 ranges and 2 metadata columns:
#       seqnames    ranges strand |        revmap     score
#          <Rle> <IRanges>  <Rle> | <IntegerList> <numeric>
#   [1]     chr1   [1,  5]      * |             3     0.330
#   [2]     chr1   [8, 20]      * |           1,2     0.165

Note that most summarization functions (mean, max, min, sum, etc...) work out-of-the-box on NumericList objects.

Unfortunately disjoin() doesn't support this argument yet (but will soon), so for now you need to compute the reverse mapping yourself. This is easy to do with findOverlaps():

revmap <- as(findOverlaps(d, xy), "List")
revmap
# IntegerList of length 3
# [[1]] 1
# [[2]] 1 2
# [[3]] 2

Then you can use revmap to propagate the scores by summarizing them in a way similar to what I did above with reduce():

d_scores <- extractList(mcols(xy)$score, revmap)
mcols(d)$score <- mean(d_scores)

Alternatively, if you want to expand d i.e. repeat each range in d as many times as the number of scores associated to it, you can do:

d <- rep(d, lengths(d_scores))
mcols(d)$score <- unlist(d_scores)
d
# GRanges object with 4 ranges and 1 metadata column:
#       seqnames    ranges strand |     score
#          <Rle> <IRanges>  <Rle> | <numeric>
#   [1]     chr1  [ 8, 10]      * |      0.11
#   [2]     chr1  [11, 15]      * |      0.11
#   [3]     chr1  [11, 15]      * |      0.22
#   [4]     chr1  [16, 20]      * |      0.22

Cheers,

H.

ADD COMMENT
2
Entering edit mode
shepherl 3.8k
@lshep
Last seen 17 minutes ago
United States

Hi

The with.revmap argument has been implemented for disjoin() and can be utilized as shown above.  A metadata column revmap describes the mapping from the output ranges to the input ranges. It is available in the devel version of Bioconductor.  You will need IRanges >= 2.7.5  and GenomicRanges 1.25.4

Cheers, 

Lori 

ADD COMMENT

Login before adding your answer.

Traffic: 931 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6