Question: reduce function in GenomicRanges
0
Frocha10 wrote:

How to combine genomic intervals with gap widths (e.g., <100) on a circular reference sequence? The reduce() does not work well for intervals nearby the base with coordinate 1.

genomicranges reduce • 1.1k views  modified 2.4 years ago by Hervé Pagès ♦♦ 14k • written 2.4 years ago by Frocha10

Do you have an example?

Here is an example:

> seqinfo <- Seqinfo("chr1",
+ seqlengths=100,
+ isCircular=TRUE)
>
> gr=GRanges("chr1", IRanges(c(5, 95), c(10, 101)),
+ strand="*",seqinfo=seqinfo)
>
> reduce(gr,min.gapwidth=10)
GRanges object with 2 ranges and 0 metadata columns:
seqnames    ranges strand
<Rle> <IRanges>  <Rle>
     chr1 [ 5,  10]      *
     chr1 [95, 101]      *
-------
seqinfo: 1 sequence (1 circular) from an unspecified genome

Ideally, after reduce() the gr should contain one interval: [95, 110]

1
Hervé Pagès ♦♦ 14k wrote:

[Edited on Jul 23, 2017: Simplified and fixed bug in circularReduceIRanges(). Also added the min.gapwidth argument.]

Hi,

Indeed, reduce() does not handle circularity yet.

The circularReduceGRanges() function below implements a circularity-aware reduction for GRanges objects:

seqinfo <- Seqinfo("chr1", seqlengths=100, isCircular=TRUE)
gr <- GRanges("chr1", IRanges(c(5, 95), c(10, 101)),
seqinfo=seqinfo)
circularReduceGRanges(gr, min.gapwidth=10)
# GRanges object with 1 range and 1 metadata column:
#       seqnames    ranges strand |        revmap
#          <Rle> <IRanges>  <Rle> | <IntegerList>
#        chr1 [95, 110]      * |           2,1
#   -------
#  seqinfo: 1 sequence (1 circular) from an unspecified genome


With a more complicated object:

seqinfo <- Seqinfo(seqnames=c("chr1", "chr2"),
seqlengths=c(100, NA),
isCircular=c(TRUE, FALSE))
gr2 <- GRanges(Rle(c("chr2", "chr1"), c(3, 7)),
IRanges(c(105,20, 44, 8, 95,103,25,55, 2,21),
c(222,35,104,15,101,105,30,60,10,28)),
seqinfo=seqinfo)
circularReduceGRanges(gr2)
# GRanges object with 5 ranges and 1 metadata column:
#       seqnames    ranges strand |        revmap
#          <Rle> <IRanges>  <Rle> | <IntegerList>
#        chr1 [21,  30]      * |          10,7
#        chr1 [55,  60]      * |             8
#        chr1 [95, 115]      * |     5,9,6,...
#        chr2 [20,  35]      * |             2
#        chr2 [44, 222]      * |           3,1
#   -------
#   seqinfo: 2 sequences (1 circular) from an unspecified genome

Lightly tested only...

Cheers,
H.

circularReduceIRanges <- function(x, min.gapwidth=1L, circle.length=NA)
{
stopifnot(is(x, "IRanges"), isSingleNumberOrNA(circle.length))
if (is.na(circle.length))
return(reduce(x, min.gapwidth=min.gapwidth, with.revmap=TRUE))
circle.length <- as.integer(circle.length)
stopifnot(circle.length >= 1L)

x <- IRanges:::.shift_ranges_to_first_circle(x, circle.length)
ans <- reduce(c(x, shift(x, circle.length)),
min.gapwidth=min.gapwidth, with.revmap=TRUE)
ans <- ans[start(ans) <= circle.length]
mcols(ans)$revmap <- (mcols(ans)$revmap - 1L) %% length(x) + 1L
ans_len <- length(ans)
# Jul 23, 2017: Found a bug in the original function and fixed it by
# replacing its last lines with the following lines.
if (ans_len <= 1L)
return(ans)
last_end <- end(ans)[[ans_len]]
keep_idx <- which(start(ans) > last_end - circle.length)
if (length(keep_idx) == 0L)
keep_idx <- ans_len
ans[keep_idx]
}

circularReduceGRanges <- function(x, min.gapwidth=1L)
{
stopifnot(is(x, "GRanges"))
rgl <- GenomicRanges:::deconstructGRintoRGL(x, with.revmap=TRUE)
circle_lens <- seqlengths(x)
circle_lens[!(isCircular(x) %in% TRUE)] <- NA
circle_lens <- rep(circle_lens, each=3L)
rgl2 <- IRangesList(mapply(circularReduceIRanges, rgl, circle_lens,
MoreArgs=list(min.gapwidth=min.gapwidth)))
rgl2 <- GenomicRanges:::.fix_inner_revmap_mcol(rgl2, rgl)
GenomicRanges:::reconstructGRfromRGL(rgl2, x)
}


Thank you very much!

However, I wish to have a function which allows min.gapwidth option. Is there a way to do so?

Oh I see now why you expected ranges chr1:5-10 and chr1:95-101 to be merged. Sorry for missing that. I modified my answer above to support the min.gapwidth argument. Also fixed a bug in circularReduceIRanges().

Cheers,

H.