Entering edit mode
I would like to use Bioconductor to calculate the depth and breadth of coverage from a given BAM file (as explained here). When I use the example BAM file provided with Rsamtools, I find the same results as when I use samtools in the cmd-line. But when I use my own BAM file, I find different results, and I am not sure why. Is it because I should set some flags in ScanBamParam? But which one(s)?
Here is my R code:
library(Rsamtools)
f <- system.file("extdata", "ex1.bam", package="Rsamtools")
h <- scanBamHeader(files=f, what="targets")
(sl <- h[[f]]$targets)
rl <- RangesList()
rl[[names(sl)[1]]] <- IRanges(start=1, end=sl[1], names=names(sl)[1])
sbp <- ScanBamParam(which=rl)
pp <- PileupParam(max_depth=10^4,
min_base_quality=1,
min_mapq=5,
min_nucleotide_depth=1,
distinguish_strands=FALSE,
distinguish_nucleotides=FALSE,
ignore_query_Ns=FALSE)
p <- pileup(file=f, yieldSize=10^4, scanBamParam=sbp, pileupParam=pp)
nrow(p) # 1569
sum(p$count) # 52146
And here is the samtools command:
cmd <- paste0("samtools depth -d ", 10^4, " -q 1 -Q 5 -r ", names(rl), " ", f,
" | awk '{sum+=$3;cnt++}END{print cnt \"\t\" sum}'")
system(cmd)

Do note that you are telling
PileupParamto truncate at a depth of 10^6, but you are telling samtools to truncate at a depth of 10^4. This may or may not result in big differences in your results.@JamesW. MacDonald you're right, my bad, I'll fix the typo. Still, using 10^4 for both doesn't solve the problem.
Probably it would help to identify how the results differ, maybe by focusing on single positions and enumerating the overlapping reads.
@Martin Morgan you're right, I'll look at a few differing positions and will post my solution (if I find one ;)
I eventually looked at the link you mentioned, and wanted to suggest the
coverage()function, withslice()and operations likesum()on *List and Views objects being useful. If you're having trouble finding differences between pileup and samtools, I'd isolate a region viaScanBamParam(which=GRanges(...))and look especially at the flag field of the reads overlapping the region. Also, you can useRsamtools::filterBam()to create a mini-bam file illustrating the discrepancy, and post that somewhere or forward to me via email martin.morgan at roswellpark.org.@Martin Morgan: thanks, I just sent you an email with a minimal, reproducible example.