How to combine genomic intervals with gap widths (e.g., <100) on a circular reference sequence? The reduce() does not work well for intervals nearby the base with coordinate 1.
How to combine genomic intervals with gap widths (e.g., <100) on a circular reference sequence? The reduce() does not work well for intervals nearby the base with coordinate 1.
[Edited on Jul 23, 2017: Simplified and fixed bug in circularReduceIRanges(). Also added the min.gapwidth argument.]
Hi,
Indeed, reduce() does not handle circularity yet.
The circularReduceGRanges() function below implements a circularity-aware reduction for GRanges objects:
seqinfo <- Seqinfo("chr1", seqlengths=100, isCircular=TRUE)
gr <- GRanges("chr1", IRanges(c(5, 95), c(10, 101)),
seqinfo=seqinfo)
circularReduceGRanges(gr, min.gapwidth=10)
# GRanges object with 1 range and 1 metadata column:
# seqnames ranges strand | revmap
# <Rle> <IRanges> <Rle> | <IntegerList>
# [1] chr1 [95, 110] * | 2,1
# -------
# seqinfo: 1 sequence (1 circular) from an unspecified genome
With a more complicated object:
seqinfo <- Seqinfo(seqnames=c("chr1", "chr2"),
seqlengths=c(100, NA),
isCircular=c(TRUE, FALSE))
gr2 <- GRanges(Rle(c("chr2", "chr1"), c(3, 7)),
IRanges(c(105,20, 44, 8, 95,103,25,55, 2,21),
c(222,35,104,15,101,105,30,60,10,28)),
seqinfo=seqinfo)
circularReduceGRanges(gr2)
# GRanges object with 5 ranges and 1 metadata column:
# seqnames ranges strand | revmap
# <Rle> <IRanges> <Rle> | <IntegerList>
# [1] chr1 [21, 30] * | 10,7
# [2] chr1 [55, 60] * | 8
# [3] chr1 [95, 115] * | 5,9,6,...
# [4] chr2 [20, 35] * | 2
# [5] chr2 [44, 222] * | 3,1
# -------
# seqinfo: 2 sequences (1 circular) from an unspecified genome
Lightly tested only...
Cheers,
H.
circularReduceIRanges <- function(x, min.gapwidth=1L, circle.length=NA)
{
stopifnot(is(x, "IRanges"), isSingleNumberOrNA(circle.length))
if (is.na(circle.length))
return(reduce(x, min.gapwidth=min.gapwidth, with.revmap=TRUE))
circle.length <- as.integer(circle.length)
stopifnot(circle.length >= 1L)
x <- IRanges:::.shift_ranges_to_first_circle(x, circle.length)
ans <- reduce(c(x, shift(x, circle.length)),
min.gapwidth=min.gapwidth, with.revmap=TRUE)
ans <- ans[start(ans) <= circle.length]
mcols(ans)$revmap <- (mcols(ans)$revmap - 1L) %% length(x) + 1L
ans_len <- length(ans)
# Jul 23, 2017: Found a bug in the original function and fixed it by
# replacing its last lines with the following lines.
if (ans_len <= 1L)
return(ans)
last_end <- end(ans)[[ans_len]]
keep_idx <- which(start(ans) > last_end - circle.length)
if (length(keep_idx) == 0L)
keep_idx <- ans_len
ans[keep_idx]
}
circularReduceGRanges <- function(x, min.gapwidth=1L)
{
stopifnot(is(x, "GRanges"))
rgl <- GenomicRanges:::deconstructGRintoRGL(x, with.revmap=TRUE)
circle_lens <- seqlengths(x)
circle_lens[!(isCircular(x) %in% TRUE)] <- NA
circle_lens <- rep(circle_lens, each=3L)
rgl2 <- IRangesList(mapply(circularReduceIRanges, rgl, circle_lens,
MoreArgs=list(min.gapwidth=min.gapwidth)))
rgl2 <- GenomicRanges:::.fix_inner_revmap_mcol(rgl2, rgl)
GenomicRanges:::reconstructGRfromRGL(rgl2, x)
}
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Do you have an example?
Here is an example:
> seqinfo <- Seqinfo("chr1",
+ seqlengths=100,
+ isCircular=TRUE)
>
> gr=GRanges("chr1", IRanges(c(5, 95), c(10, 101)),
+ strand="*",seqinfo=seqinfo)
>
> reduce(gr,min.gapwidth=10)
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [ 5, 10] *
[2] chr1 [95, 101] *
-------
seqinfo: 1 sequence (1 circular) from an unspecified genome
Ideally, after reduce() the gr should contain one interval: [95, 110]