3.2 years ago by
IMPORTANT: When you unlist
cvg, you end up with a single Rle object representing the coverage along a single virtual sequence made of the concatenation of all the chromosomes in the genome. So the coordinate system you need to use to specify ranges along that single virtual sequence is not the same as before the unlisting. Before the unlisting, it was a local one i.e. 1 coordinate system per-chromosome. After the unlisting, it becomes a global one i.e. 1 single coordinate system along 1 single virtual sequence.
That means you can't just unlist the ranges in
cvg_cds and use them as-is to define views on
unlist(cvg). That gives you incorrect view coordinates. Instead you need to do something like:
cds_views <- Views(unlist(cvg), absoluteRanges(unlist(cvg_cds)))
?absoluteRanges for the details.
Now back to "how to extract the read coverage in a genome over a list of ranges"
Using views for this is not as convenient as it could be at the moment but we will try to improve the situation in the near future by implementing containers for genomic views (more details on the bioc-devel mailing list here https://stat.ethz.ch/pipermail/bioc-devel/2016-June/009433.html if you are curious about this).
Anyway, right now this can be done by just subsetting your RleList object (
cvg in your case) with the GRanges object containing your ranges of interest:
cds <- unlist(cvg_cds, use.names=FALSE)
Note that before we can subset
cds we need to drop the sequence levels (i.e. chromosomes) from the latter that are not represented in the former:
seqlevels(cds, force=TRUE) <- names(cvg)
Then we can subset:
cvg[cds] # RleList object with 1 list element per range in cds
Last but not least: You might want to check
coverageByTranscript() in GenomicFeatures. It might actually do exactly what you want.