Question: Views with RleLists and IRangesLists
0
3.3 years ago by
ccanchaya0 wrote:

Hi,

I am trying to have the read coverage on transcripts of a non-model genome.  I created an GenomicAlignment object from the bam file and I have the annotations usin txdb object. I would like to use Views to extract the coverage over the regions I am interested.

reads<-readGAlignments("/Volumes/FT2/octopus/analyses/clean/17/17.order.RG.bam")
cvg_cds <- cdsBy(txdb, by="tx", use.names=FALSE)
cds_views<-Views(unlist(cvg),unlist(ranges(cvg_cds)))
slice(cds_views,lower=3)

However it takes ages to process this.  I think that unlisting the GenomicRangesList (cvg_cds) and unlisting Rlelist (cvg) would be the problem but it was the only way to work with the data.

Would there be a faster way to extract the read coverage in a genome over a list of ranges?

Best regards,

Carlos

modified 3.3 years ago by Hervé Pagès ♦♦ 14k • written 3.3 years ago by ccanchaya0
Answer: Views with RleLists and IRangesLists
0
3.3 years ago by
Hervé Pagès ♦♦ 14k
United States
Hervé Pagès ♦♦ 14k wrote:

Hi Carlos,

IMPORTANT: When you unlist cvg, you end up with a single Rle object representing the coverage along a single virtual sequence made of the concatenation of all the chromosomes in the genome. So the coordinate system you need to use to specify ranges along that single virtual sequence is not the same as before the unlisting. Before the unlisting, it was a local one i.e. 1 coordinate system per-chromosome. After the unlisting, it becomes a global one i.e. 1 single coordinate system along 1 single virtual sequence.

That means you can't just unlist the ranges in cvg_cds and use them as-is to define views on unlist(cvg). That gives you incorrect view coordinates. Instead you need to do something like:

cds_views <- Views(unlist(cvg), absoluteRanges(unlist(cvg_cds)))

See ?absoluteRanges for the details.

Now back to "how to extract the read coverage in a genome over a list of ranges"

Using views for this is not as convenient as it could be at the moment but we will try to improve the situation in the near future by implementing containers for genomic views (more details on the bioc-devel mailing list here https://stat.ethz.ch/pipermail/bioc-devel/2016-June/009433.html if you are curious about this).

Anyway, right now this can be done by just subsetting your RleList object (cvg in your case) with the GRanges object containing your ranges of interest:

cds <- unlist(cvg_cds, use.names=FALSE)

Note that before we can subset cvg with cds we need to drop the sequence levels (i.e. chromosomes) from the latter that are not represented in the former:

seqlevels(cds, force=TRUE) <- names(cvg)

Then we can subset:

cvg[cds]  # RleList object with 1 list element per range in cds

Last but not least: You might want to check coverageByTranscript() in GenomicFeatures. It might actually do exactly what you want.

Cheers,

H.