Dear @gtho123,
Thanks for your question!
By design bamsignals::bamProfile()
and bamsignals::bamCount()
work on the read level which allows for the quantification of read counts on each strand individually. On the contrary, bamsignals::bamCoverage()
works on the fragment level which does not allow for strand-specific DNA fragment coverage because the strand of one DNA fragment is effectively undefined.
However, if a strand-specific paired end protocol (e.g. RNA-seq) is used, a strand-specific fragment coverage can be still be defined where the first read in a pair defines the strand that gave rise to this fragment. In this case, you can achieve a strand-specific coverage by filtering on the SAMFLAG with the filteredFlag
argument ( see https://broadinstitute.github.io/picard/explain-flags.html ):
If you deal with single end libraries, I suggest you use bamsignals::bamProfile()
to filter on the strand and infer the coverage based on the expected fragment length ( see https://github.com/lamortenera/bamsignals/issues/16 ). For example,
bamCoverageFrags <- function(bampath, gr, fragLength=200, ss = TRUE, ...) {
prf <- bamProfile(bampath, gr, binsize=1, ss = TRUE, ...)
cov <- lapply(as.list(prf), function(p) {
l <- dim(p)[2]
x1 <- cumsum(as.numeric(p[1,]))
y1 <- c(rep(0, fragLength), x1[1:(l-fragLength)])
x2 <- cumsum(as.numeric(rev(p[2,])))
y2 <- c(rep(0, fragLength), x2[1:(l-fragLength)])
matrix(c(x1-y1, rev(x2-y2)), ncol = l, byrow = T,
dimnames = list(c("sense", "antisense"), NULL))
})
if (ss == TRUE) {
prf@signals <- cov
} else {
prf@signals <- lapply(cov, colSums)
}
invisible(prf)
}
Johannes
Just a note that the original question sent email to the maintainer, who had an autoreply generating a comment on the original question that noted that they were out of the office until (23 January?) which generated an email to the maintainer which generated an autoreply... and drastic steps were taken on our end to stop this loop; the maintainer will see your message when they return...