Search
Question: reduce function in GenomicRanges
0
gravatar for Frocha
4 months ago by
Frocha0
Frocha0 wrote:

How to combine genomic intervals with gap widths (e.g., <100) on a circular reference sequence? The reduce() does not work well for intervals nearby the base with coordinate 1.

ADD COMMENTlink modified 4 months ago by Hervé Pagès ♦♦ 13k • written 4 months ago by Frocha0

Do you have an example?

ADD REPLYlink written 4 months ago by James W. MacDonald45k

Here is an example:

> seqinfo <- Seqinfo("chr1",
+ seqlengths=100,
+ isCircular=TRUE)
>
> gr=GRanges("chr1", IRanges(c(5, 95), c(10, 101)),
+ strand="*",seqinfo=seqinfo)
>
> reduce(gr,min.gapwidth=10)
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1 [ 5,  10]      *
  [2]     chr1 [95, 101]      *
  -------
  seqinfo: 1 sequence (1 circular) from an unspecified genome

Ideally, after reduce() the gr should contain one interval: [95, 110]

 

ADD REPLYlink modified 4 months ago • written 4 months ago by Frocha0
1
gravatar for Hervé Pagès
4 months ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

[Edited on Jul 23, 2017: Simplified and fixed bug in circularReduceIRanges(). Also added the min.gapwidth argument.]

Hi,

Indeed, reduce() does not handle circularity yet.

The circularReduceGRanges() function below implements a circularity-aware reduction for GRanges objects:

seqinfo <- Seqinfo("chr1", seqlengths=100, isCircular=TRUE)
gr <- GRanges("chr1", IRanges(c(5, 95), c(10, 101)),
              seqinfo=seqinfo)
circularReduceGRanges(gr, min.gapwidth=10)
# GRanges object with 1 range and 1 metadata column:
#       seqnames    ranges strand |        revmap
#          <Rle> <IRanges>  <Rle> | <IntegerList>
#   [1]     chr1 [95, 110]      * |           2,1
#   -------
#  seqinfo: 1 sequence (1 circular) from an unspecified genome

With a more complicated object:

seqinfo <- Seqinfo(seqnames=c("chr1", "chr2"),
                   seqlengths=c(100, NA),
                   isCircular=c(TRUE, FALSE))
gr2 <- GRanges(Rle(c("chr2", "chr1"), c(3, 7)),
               IRanges(c(105,20, 44, 8, 95,103,25,55, 2,21),
                       c(222,35,104,15,101,105,30,60,10,28)),
               seqinfo=seqinfo)
circularReduceGRanges(gr2)
# GRanges object with 5 ranges and 1 metadata column:
#       seqnames    ranges strand |        revmap
#          <Rle> <IRanges>  <Rle> | <IntegerList>
#   [1]     chr1 [21,  30]      * |          10,7
#   [2]     chr1 [55,  60]      * |             8
#   [3]     chr1 [95, 115]      * |     5,9,6,...
#   [4]     chr2 [20,  35]      * |             2
#   [5]     chr2 [44, 222]      * |           3,1
#   -------
#   seqinfo: 2 sequences (1 circular) from an unspecified genome

Lightly tested only...

Cheers,
H.

circularReduceIRanges <- function(x, min.gapwidth=1L, circle.length=NA)
{
    stopifnot(is(x, "IRanges"), isSingleNumberOrNA(circle.length))
    if (is.na(circle.length))
        return(reduce(x, min.gapwidth=min.gapwidth, with.revmap=TRUE))
    circle.length <- as.integer(circle.length)
    stopifnot(circle.length >= 1L)

    x <- IRanges:::.shift_ranges_to_first_circle(x, circle.length)
    ans <- reduce(c(x, shift(x, circle.length)),
                  min.gapwidth=min.gapwidth, with.revmap=TRUE)
    ans <- ans[start(ans) <= circle.length]
    mcols(ans)$revmap <- (mcols(ans)$revmap - 1L) %% length(x) + 1L
    ans_len <- length(ans)
    # Jul 23, 2017: Found a bug in the original function and fixed it by
    # replacing its last lines with the following lines.
    if (ans_len <= 1L) 
        return(ans)
    last_end <- end(ans)[[ans_len]]
    keep_idx <- which(start(ans) > last_end - circle.length)
    if (length(keep_idx) == 0L)
        keep_idx <- ans_len
    ans[keep_idx]
}

circularReduceGRanges <- function(x, min.gapwidth=1L)
{
    stopifnot(is(x, "GRanges"))
    rgl <- GenomicRanges:::deconstructGRintoRGL(x, with.revmap=TRUE)
    circle_lens <- seqlengths(x)
    circle_lens[!(isCircular(x) %in% TRUE)] <- NA
    circle_lens <- rep(circle_lens, each=3L)
    rgl2 <- IRangesList(mapply(circularReduceIRanges, rgl, circle_lens,
                               MoreArgs=list(min.gapwidth=min.gapwidth)))
    rgl2 <- GenomicRanges:::.fix_inner_revmap_mcol(rgl2, rgl)
    GenomicRanges:::reconstructGRfromRGL(rgl2, x)
}
ADD COMMENTlink modified 3 months ago • written 4 months ago by Hervé Pagès ♦♦ 13k

Thank you very much!

However, I wish to have a function which allows min.gapwidth option. Is there a way to do so?

ADD REPLYlink modified 4 months ago • written 4 months ago by Frocha0

Oh I see now why you expected ranges chr1:5-10 and chr1:95-101 to be merged. Sorry for missing that. I modified my answer above to support the min.gapwidth argument. Also fixed a bug in circularReduceIRanges().

Cheers,

H.

ADD REPLYlink written 3 months ago by Hervé Pagès ♦♦ 13k

Thank you very much!

ADD REPLYlink written 3 months ago by Frocha0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 191 users visited in the last hour