Efficiently construct GRanges/IRanges from Rle vector
1
0
Entering edit mode
my4 ▴ 10
@my4-10129
Last seen 7.0 years ago

I have a Run length encoded vector representing some value at every position on the genome, in order. As a toy example suppose I had just one chromosome of length 10, then I would have a vector looking like

library(GenomicRanges)

set.seed(1)
toyData = Rle(sample(1:3,10,replace=TRUE))

I would like to coerce this into a GRanges object. The best I can come up with is

gr = GRanges('toyChr',IRanges(start(toyData)-1,
                              width=runLength(toyData)),
                              toyData = runValue(toyData))

which works, but is quite slow. Is there a faster way to construct the same object?

Rle granges iranges • 1.8k views
ADD COMMENT
0
Entering edit mode

For a chromosome length object, when I run this

    topData = Rle(sample(1:3,1e8,replace=TRUE))

    microbenchmark(GRanges('toyChr',IRanges(start(toyData)-1, width=runLength(toyData), toyData = runValue(toyData)), times = 3)

It takes roughly 15 seconds to convert

    Unit: seconds
                                                                                                             expr
     GRanges("toyChr", IRanges(start(toyData) - 1, width = runLength(toyData)),      toyData = runValue(toyData))
         min       lq     mean  median       uq      max neval
     14.8322 15.21285 15.58755 15.5935 15.96522 16.33694     3

ADD REPLY
0
Entering edit mode

Thanks, I edited my answer.

ADD REPLY
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.9 years ago
United States

On my machine, in devel, that example takes about 3 seconds. 1/3 of the time is spent making the IRanges, and the rest is basically in the validity check. 

Btw, I don't think you should subtract one from the start. IRanges is 1-based.

Calling as(x, "GRanges") is the canonical way to convert an RleList (multiple chromosomes) to a GRanges where each range represents a run in the Rle. I just rewrote it in devel so that it's a bit faster, but it's still going to be slow due to the overhead of lists. Would be nice to be able to skip some of the validity checking internally. What do people think?

ADD COMMENT

Login before adding your answer.

Traffic: 759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6