Question: Efficient coverage calculation with RleList and GRanges
0
3.2 years ago by
Germany
Christian Ruckert170 wrote:

I am interested in calculating the average coverage and the percent of bases covered more than x-times for a large number of regions.

My current and working approach is:

1. Calculate the coverage for every position in the genome (cov is a SimpleRleList):

cov <- coverage(BamFile)

2. Subset to get the coverage for each target region given as a GRanges object (targets_cov is a CompressedRleList):

targets_cov <- cov[targets]

3. Calculate the mean and more than x-times values:

sapply(targets_cov, mean) or mean(targets_cov) respectively
sapply(targets_cov, function(x){sum(x >= 50)}) / length * 100

Step 3. gets very slow for GRanges with many target regions (>200000). Would a solution using Views, viewMeans and Slices be faster? And how do I create this view on my RleList using the GRanges target object (this was asked a few years ago, but never answered).

Kind regards,

Christian

coverage granges rle view mean • 800 views
modified 3.2 years ago by Michael Lawrence11k • written 3.2 years ago by Christian Ruckert170
Answer: Efficient coverage calculation with RleList and GRanges
0
3.2 years ago by
United States
Michael Lawrence11k wrote:

Wow, you are right, mean,CompressedRleList was never optimized. It now is in devel. For now, you can just coerce to a Views with as(targets_cov, "RleViews").

Also, to speed up the second calculation, use sum(targets_cov >= 50). That should be fast.