Question: Efficient coverage calculation with RleList and GRanges
0
gravatar for Christian Ruckert
3.4 years ago by
Germany
Christian Ruckert170 wrote:

I am interested in calculating the average coverage and the percent of bases covered more than x-times for a large number of regions.

My current and working approach is:

1. Calculate the coverage for every position in the genome (cov is a SimpleRleList):

cov <- coverage(BamFile)

2. Subset to get the coverage for each target region given as a GRanges object (targets_cov is a CompressedRleList):

targets_cov <- cov[targets]

3. Calculate the mean and more than x-times values:

sapply(targets_cov, mean) or mean(targets_cov) respectively
sapply(targets_cov, function(x){sum(x >= 50)}) / length * 100

Step 3. gets very slow for GRanges with many target regions (>200000). Would a solution using Views, viewMeans and Slices be faster? And how do I create this view on my RleList using the GRanges target object (this was asked a few years ago, but never answered).

 

Kind regards,

Christian

 

coverage granges rle view mean • 830 views
ADD COMMENTlink modified 3.4 years ago by Michael Lawrence11k • written 3.4 years ago by Christian Ruckert170
Answer: Efficient coverage calculation with RleList and GRanges
0
gravatar for Michael Lawrence
3.4 years ago by
United States
Michael Lawrence11k wrote:

Wow, you are right, mean,CompressedRleList was never optimized. It now is in devel. For now, you can just coerce to a Views with as(targets_cov, "RleViews").

Also, to speed up the second calculation, use sum(targets_cov >= 50). That should be fast.

ADD COMMENTlink written 3.4 years ago by Michael Lawrence11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 380 users visited in the last hour