How to combine genomic intervals with gap widths (e.g., <100) on a circular reference sequence? The reduce() does not work well for intervals nearby the base with coordinate 1.
How to combine genomic intervals with gap widths (e.g., <100) on a circular reference sequence? The reduce() does not work well for intervals nearby the base with coordinate 1.
[Edited on Jul 23, 2017: Simplified and fixed bug in circularReduceIRanges(). Also added the min.gapwidth argument.]
Hi,
Indeed, reduce() does not handle circularity yet.
The circularReduceGRanges() function below implements a circularity-aware reduction for GRanges objects:
seqinfo <- Seqinfo("chr1", seqlengths=100, isCircular=TRUE)
gr <- GRanges("chr1", IRanges(c(5, 95), c(10, 101)),
              seqinfo=seqinfo)
circularReduceGRanges(gr, min.gapwidth=10)
# GRanges object with 1 range and 1 metadata column:
#       seqnames    ranges strand |        revmap
#          <Rle> <IRanges>  <Rle> | <IntegerList>
#   [1]     chr1 [95, 110]      * |           2,1
#   -------
#  seqinfo: 1 sequence (1 circular) from an unspecified genome
With a more complicated object:
seqinfo <- Seqinfo(seqnames=c("chr1", "chr2"),
                   seqlengths=c(100, NA),
                   isCircular=c(TRUE, FALSE))
gr2 <- GRanges(Rle(c("chr2", "chr1"), c(3, 7)),
               IRanges(c(105,20, 44, 8, 95,103,25,55, 2,21),
                       c(222,35,104,15,101,105,30,60,10,28)),
               seqinfo=seqinfo)
circularReduceGRanges(gr2)
# GRanges object with 5 ranges and 1 metadata column:
#       seqnames    ranges strand |        revmap
#          <Rle> <IRanges>  <Rle> | <IntegerList>
#   [1]     chr1 [21,  30]      * |          10,7
#   [2]     chr1 [55,  60]      * |             8
#   [3]     chr1 [95, 115]      * |     5,9,6,...
#   [4]     chr2 [20,  35]      * |             2
#   [5]     chr2 [44, 222]      * |           3,1
#   -------
#   seqinfo: 2 sequences (1 circular) from an unspecified genome
Lightly tested only...
Cheers,
H.
circularReduceIRanges <- function(x, min.gapwidth=1L, circle.length=NA)
{
    stopifnot(is(x, "IRanges"), isSingleNumberOrNA(circle.length))
    if (is.na(circle.length))
        return(reduce(x, min.gapwidth=min.gapwidth, with.revmap=TRUE))
    circle.length <- as.integer(circle.length)
    stopifnot(circle.length >= 1L)
    x <- IRanges:::.shift_ranges_to_first_circle(x, circle.length)
    ans <- reduce(c(x, shift(x, circle.length)),
                  min.gapwidth=min.gapwidth, with.revmap=TRUE)
    ans <- ans[start(ans) <= circle.length]
    mcols(ans)$revmap <- (mcols(ans)$revmap - 1L) %% length(x) + 1L
    ans_len <- length(ans)
    # Jul 23, 2017: Found a bug in the original function and fixed it by
    # replacing its last lines with the following lines.
    if (ans_len <= 1L) 
        return(ans)
    last_end <- end(ans)[[ans_len]]
    keep_idx <- which(start(ans) > last_end - circle.length)
    if (length(keep_idx) == 0L)
        keep_idx <- ans_len
    ans[keep_idx]
}
circularReduceGRanges <- function(x, min.gapwidth=1L)
{
    stopifnot(is(x, "GRanges"))
    rgl <- GenomicRanges:::deconstructGRintoRGL(x, with.revmap=TRUE)
    circle_lens <- seqlengths(x)
    circle_lens[!(isCircular(x) %in% TRUE)] <- NA
    circle_lens <- rep(circle_lens, each=3L)
    rgl2 <- IRangesList(mapply(circularReduceIRanges, rgl, circle_lens,
                               MoreArgs=list(min.gapwidth=min.gapwidth)))
    rgl2 <- GenomicRanges:::.fix_inner_revmap_mcol(rgl2, rgl)
    GenomicRanges:::reconstructGRfromRGL(rgl2, x)
}
                    
                
                Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Do you have an example?
Here is an example:
> seqinfo <- Seqinfo("chr1",
+ seqlengths=100,
+ isCircular=TRUE)
>
> gr=GRanges("chr1", IRanges(c(5, 95), c(10, 101)),
+ strand="*",seqinfo=seqinfo)
>
> reduce(gr,min.gapwidth=10)
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [ 5, 10] *
[2] chr1 [95, 101] *
-------
seqinfo: 1 sequence (1 circular) from an unspecified genome
Ideally, after reduce() the gr should contain one interval: [95, 110]