Hi,
Maybe you want to consider using disjoin()
for that:
library(GenomicRanges)
x <- GRanges("chr1:8-15", score=0.11)
y <- GRanges("chr1:11-20", score=0.22)
xy <- c(x, y)
d <- disjoin(xy)
d
# GRanges object with 3 ranges and 0 metadata columns:
# seqnames ranges strand
# <Rle> <IRanges> <Rle>
# [1] chr1 [ 8, 10] *
# [2] chr1 [11, 15] *
# [3] chr1 [16, 20] *
See ?disjoin
for more information.
disjoin()
(like reduce()
and the inter range transformations in general) doesn't propagate the metadata columns. That's because a range in the output can be mapped to more than 1 range in the input (one-to-many mapping) so there is no canonical/obvious way to do this. Note that reduce()
has a with.revmap
argument that allows the user to obtain this mapping:
z <- GRanges("chr1:1-5", score=0.33)
xyz <- c(x, y, z)
r <- reduce(xyz, with.revmap=TRUE)
r
# GRanges object with 2 ranges and 1 metadata column:
# seqnames ranges strand | revmap
# <Rle> <IRanges> <Rle> | <IntegerList>
# [1] chr1 [1, 5] * | 3
# [2] chr1 [8, 20] * | 1,2
Metadata column revmap
describes the mapping from the output ranges to the input ranges. The user can use it to propagate the metadata columns:
revmap <- mcols(r)$revmap
r_scores <- extractList(mcols(xyz)$score, revmap)
r_scores
# NumericList of length 2
# [[1]] 0.33
# [[2]] 0.11 0.22
r_scores
is a list-like object parallel to the output (i.e. the i-th list-element in r_scores
corresponds to the i-th range in the output). You can stick it on the output:
mcols(r)$score <- r_scores
r
# GRanges object with 2 ranges and 2 metadata columns:
# seqnames ranges strand | revmap score
# <Rle> <IRanges> <Rle> | <IntegerList> <NumericList>
# [1] chr1 [1, 5] * | 3 0.33
# [2] chr1 [8, 20] * | 1,2 0.11,0.22
But you can also summarize it first:
mcols(r)$score <- mean(r_scores)
r
# GRanges object with 2 ranges and 2 metadata columns:
# seqnames ranges strand | revmap score
# <Rle> <IRanges> <Rle> | <IntegerList> <numeric>
# [1] chr1 [1, 5] * | 3 0.330
# [2] chr1 [8, 20] * | 1,2 0.165
Note that most summarization functions (mean
, max
, min
, sum
, etc...) work out-of-the-box on NumericList objects.
Unfortunately disjoin()
doesn't support this argument yet (but will soon), so for now you need to compute the reverse mapping yourself. This is easy to do with findOverlaps()
:
revmap <- as(findOverlaps(d, xy), "List")
revmap
# IntegerList of length 3
# [[1]] 1
# [[2]] 1 2
# [[3]] 2
Then you can use revmap
to propagate the scores by summarizing them in a way similar to what I did above with reduce()
:
d_scores <- extractList(mcols(xy)$score, revmap)
mcols(d)$score <- mean(d_scores)
Alternatively, if you want to expand d
i.e. repeat each range in d
as many times as the number of scores associated to it, you can do:
d <- rep(d, lengths(d_scores))
mcols(d)$score <- unlist(d_scores)
d
# GRanges object with 4 ranges and 1 metadata column:
# seqnames ranges strand | score
# <Rle> <IRanges> <Rle> | <numeric>
# [1] chr1 [ 8, 10] * | 0.11
# [2] chr1 [11, 15] * | 0.11
# [3] chr1 [11, 15] * | 0.22
# [4] chr1 [16, 20] * | 0.22
Cheers,
H.