I am interested in calculating the average coverage and the percent of bases covered more than x-times for a large number of regions.
My current and working approach is:
1. Calculate the coverage for every position in the genome (cov is a SimpleRleList):
cov <- coverage(BamFile)
2. Subset to get the coverage for each target region given as a GRanges object (targets_cov is a CompressedRleList):
targets_cov <- cov[targets]
3. Calculate the mean and more than x-times values:
sapply(targets_cov, mean) or mean(targets_cov) respectively
sapply(targets_cov, function(x){sum(x >= 50)}) / length * 100
Step 3. gets very slow for GRanges with many target regions (>200000). Would a solution using Views, viewMeans and Slices be faster? And how do I create this view on my RleList using the GRanges target object (this was asked a few years ago, but never answered).
Kind regards,
Christian