I am working with the package recount which download preprocessed RNASeq datasets. In the vinette, I am struggling to understand this :
"Downloaded count data are first scaled to take into account differing coverage between samples.
Scale counts by taking into account the total coverage per sample
rse1 <- scale_counts(rse_gene1)
"
-----------------------------------------------------------------------
The scale_counts function is as follows:
scale_counts <- function(rse, by = 'auc', targetSize = 4e7, L = 100,
factor_only = FALSE, round = TRUE) {
...
## Scale counts
if(by == 'auc') {
# L cancels out:
# have to multiply by L to get the desired library size,
# but then divide by L to take into account the read length since the
# raw counts are the sum of base-level coverage.
scaleFactor <- targetSize / SummarizedExperiment::colData(rse)$auc
...
scaleMat <- matrix(rep(scaleFactor, each = nrow(counts)),
ncol = ncol(counts))
scaledCounts <- counts * scaleMat
if(round) scaledCounts <- round(scaledCounts, 0)
SummarizedExperiment::assay(rse, 1) <- scaledCounts
return(rse)
}
--------------------------------------------------------------------
First I though that auc is the library depth (sum of all read counts in each sample) but I get a different number. What is scaling by auc ? is it an alternative to normalization ?
