Question

Shrink or Expand RLE

0

Entering edit mode

gen ▴ 30

@gen-7383

Last seen 9.0 years ago

United States

If you have an RLE like this...

integer-Rle of length 1575 with 42 runs
  Lengths: 982   1   1   2   3   2 ...   1   1   3   1 531
  Values :   1   3   4   7   9  11 ...   8   6   5   3   1

how could you compress or expand the RLE along the x axis so the resulting graph compresses or expands as so?

If not directly is there an indirect way to get the graph or the RLE to shrink or expand?

RLE iranges granges coverage • 2.8k views

ADD COMMENT • link updated 9.0 years ago by Martin Morgan 25k • written 9.0 years ago by gen ▴ 30

0

Entering edit mode

I can't really understand what you want to do. Maybe present a simple example with a plain-old-vector?

ADD REPLY • link 9.0 years ago Martin Morgan 25k

0

Entering edit mode

Basically I want to shrink or expand a coverage range and its reads along the x axis. In the ascii example plots I showed above the first range is compressed into 1/2 its size. In the second example plot it is expanded to twice its size.

The motivation is that I have read sets for ranges of several different lengths and I want to normalize them to the same length so that I can better compare their coverages together (ie make a mean coverage graph). For example

Read Set 1 are reads overlapping range 1-1000

Read Set 2 are reads overlapping range 2000-2500

Goal: Resize Read Set 2 to length 1000 by expanding all reads along the range by x 2

I haven't been able to find a function that will allow me to do this for an integer rle.

Alternatively I could try to work with the GrangesList I use to generate the integer rle. There is a resize() function for GRanges but there are two issues. The first is that I haven't seen how to resize by a factor rather than a constant number. Other than maybe going through the reads one by one with

resize(readsRange1[x],resizeFactor*width(readsRange1[x]))

And when resizing the reads are anchored at the start or the end and so expand or compress in only one direction when resizing. Which I don't think is how they should behave if you were truly symmetrically resizing the range? If I could solve the first problem at least that would be dandy.

ADD REPLY • link 9.0 years ago gen ▴ 30

0

Entering edit mode

Hi,

Not sure what you mean by resizing the reads "one by one". You seem to imply the need to use a loop. However this is not necessary because the width argument can be a vector of the length of x. Also resize() supports fix="center" for symmetrical resizing. Here is an example with an IRanges object:

ir <- IRanges(c(1, 100, 200), width=c(20, 80, 50))
ir
# IRanges of length 3
#     start end width
# [1]     1  20    20
# [2]   100 179    80
# [3]   200 249    50

resizeFactor <- 0.5

resize(ir, resizeFactor * width(ir), fix="center")
# IRanges of length 3
#     start end width
# [1]     6  15    10
# [2]   120 159    40
# [3]   212 236    25

H.

ADD REPLY • link 9.0 years ago Hervé Pagès 16k

score 1 · Answer 1 · 2015-04-04

I loaded the IRanges library

library(IRanges)

Here's a coverage vector of length 10, and a desired output length, in this case a 'stretch'

x <- c(2, 2, 0, 0 , 2, 2, 0, 0, 2, 2)
length.out <- 13

I replicated each element of x 13 times

x13 <- rep(x, each=length.out)

then calculated the mean of the resulting vector in windows of size 10, using a little trick involving how R represents a matrix

length.in <- length(x)
colMeans(matrix(x13, length.in))

or as a one-liner

> colMeans(matrix(rep(x, each=length.out), length.in))
[1] 2.0 2.0 1.2 0.0 0.0 1.6 2.0 1.6 0.0 0.0 1.2 2.0 2.0

This is our new coverage vector, which could be compressed as an Rle.

> Rle(colMeans(matrix(rep(x, each=length.out), length.in)))
numeric-Rle of length 13 with 9 runs
  Lengths:   2   1   2   1   1   1   2   1   2
  Values :   2 1.2   0 1.6   2 1.6   0 1.2   2

This approach works for 'stretch' as well as 'shrink'. It could be made more efficient by replicating / averaging using the greatest common denominator of the in and out lengths. It generates large intermediate vectors, and this could be prohibitive. The Rle approach to replication is to multiply the run lengths by the stretch factor

r <- Rle(x)
runLength(r) <- length.out * runLength(r)

Calculation of new means can be done using views

mean(Views(r, breakInChunks(length(r), length.in)))

This seems to be more memory efficient than the matrix approach. So in the end we end up with

stretch <- function(r, length.out=length(x))
{
    stopifnot(length.out > 0)
    r <- as(r, "Rle")
    length.in <- length(r)
    runLength(r) <- length.out * runLength(r)
    Rle(mean(Views(r, breakInChunks(length(r), length.in))))
}

I'm not sure that is what you were looking for, or whether there is a more efficient way to do this.