How to combine genomic intervals with gap widths (e.g., <100) on a circular reference sequence? The reduce() does not work well for intervals nearby the base with coordinate 1.
How to combine genomic intervals with gap widths (e.g., <100) on a circular reference sequence? The reduce() does not work well for intervals nearby the base with coordinate 1.
[Edited on Jul 23, 2017: Simplified and fixed bug in circularReduceIRanges()
. Also added the min.gapwidth
argument.]
Hi,
Indeed, reduce()
does not handle circularity yet.
The circularReduceGRanges()
function below implements a circularity-aware reduction for GRanges objects:
seqinfo <- Seqinfo("chr1", seqlengths=100, isCircular=TRUE) gr <- GRanges("chr1", IRanges(c(5, 95), c(10, 101)), seqinfo=seqinfo) circularReduceGRanges(gr, min.gapwidth=10) # GRanges object with 1 range and 1 metadata column: # seqnames ranges strand | revmap # <Rle> <IRanges> <Rle> | <IntegerList> # [1] chr1 [95, 110] * | 2,1 # ------- # seqinfo: 1 sequence (1 circular) from an unspecified genome
With a more complicated object:
seqinfo <- Seqinfo(seqnames=c("chr1", "chr2"), seqlengths=c(100, NA), isCircular=c(TRUE, FALSE)) gr2 <- GRanges(Rle(c("chr2", "chr1"), c(3, 7)), IRanges(c(105,20, 44, 8, 95,103,25,55, 2,21), c(222,35,104,15,101,105,30,60,10,28)), seqinfo=seqinfo) circularReduceGRanges(gr2) # GRanges object with 5 ranges and 1 metadata column: # seqnames ranges strand | revmap # <Rle> <IRanges> <Rle> | <IntegerList> # [1] chr1 [21, 30] * | 10,7 # [2] chr1 [55, 60] * | 8 # [3] chr1 [95, 115] * | 5,9,6,... # [4] chr2 [20, 35] * | 2 # [5] chr2 [44, 222] * | 3,1 # ------- # seqinfo: 2 sequences (1 circular) from an unspecified genome
Lightly tested only...
Cheers,
H.
circularReduceIRanges <- function(x, min.gapwidth=1L, circle.length=NA) { stopifnot(is(x, "IRanges"), isSingleNumberOrNA(circle.length)) if (is.na(circle.length)) return(reduce(x, min.gapwidth=min.gapwidth, with.revmap=TRUE)) circle.length <- as.integer(circle.length) stopifnot(circle.length >= 1L) x <- IRanges:::.shift_ranges_to_first_circle(x, circle.length) ans <- reduce(c(x, shift(x, circle.length)), min.gapwidth=min.gapwidth, with.revmap=TRUE) ans <- ans[start(ans) <= circle.length] mcols(ans)$revmap <- (mcols(ans)$revmap - 1L) %% length(x) + 1L ans_len <- length(ans) # Jul 23, 2017: Found a bug in the original function and fixed it by # replacing its last lines with the following lines. if (ans_len <= 1L) return(ans) last_end <- end(ans)[[ans_len]] keep_idx <- which(start(ans) > last_end - circle.length) if (length(keep_idx) == 0L) keep_idx <- ans_len ans[keep_idx] } circularReduceGRanges <- function(x, min.gapwidth=1L) { stopifnot(is(x, "GRanges")) rgl <- GenomicRanges:::deconstructGRintoRGL(x, with.revmap=TRUE) circle_lens <- seqlengths(x) circle_lens[!(isCircular(x) %in% TRUE)] <- NA circle_lens <- rep(circle_lens, each=3L) rgl2 <- IRangesList(mapply(circularReduceIRanges, rgl, circle_lens, MoreArgs=list(min.gapwidth=min.gapwidth))) rgl2 <- GenomicRanges:::.fix_inner_revmap_mcol(rgl2, rgl) GenomicRanges:::reconstructGRfromRGL(rgl2, x) }
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Do you have an example?
Here is an example:
> seqinfo <- Seqinfo("chr1",
+ seqlengths=100,
+ isCircular=TRUE)
>
> gr=GRanges("chr1", IRanges(c(5, 95), c(10, 101)),
+ strand="*",seqinfo=seqinfo)
>
> reduce(gr,min.gapwidth=10)
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [ 5, 10] *
[2] chr1 [95, 101] *
-------
seqinfo: 1 sequence (1 circular) from an unspecified genome
Ideally, after reduce() the gr should contain one interval: [95, 110]