Efficient coverage calculation with RleList and GRanges
1
0
Entering edit mode
@christian-ruckert-3294
Last seen 5.5 years ago
Germany

I am interested in calculating the average coverage and the percent of bases covered more than x-times for a large number of regions.

My current and working approach is:

1. Calculate the coverage for every position in the genome (cov is a SimpleRleList):

cov <- coverage(BamFile)

2. Subset to get the coverage for each target region given as a GRanges object (targets_cov is a CompressedRleList):

targets_cov <- cov[targets]

3. Calculate the mean and more than x-times values:

sapply(targets_cov, mean) or mean(targets_cov) respectively
sapply(targets_cov, function(x){sum(x >= 50)}) / length * 100

Step 3. gets very slow for GRanges with many target regions (>200000). Would a solution using Views, viewMeans and Slices be faster? And how do I create this view on my RleList using the GRanges target object (this was asked a few years ago, but never answered).

 

Kind regards,

Christian

 

granges Rle coverage View mean • 1.8k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States

Wow, you are right, mean,CompressedRleList was never optimized. It now is in devel. For now, you can just coerce to a Views with as(targets_cov, "RleViews").

Also, to speed up the second calculation, use sum(targets_cov >= 50). That should be fast.

ADD COMMENT

Login before adding your answer.

Traffic: 550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6