I am working with the package recount which download preprocessed RNASeq datasets. In the vinette, I am struggling to understand this :
"Downloaded count data are first scaled to take into account differing coverage between samples.
Scale counts by taking into account the total coverage per sample
rse1 <- scale_counts(rse_gene1)
"
-----------------------------------------------------------------------
The scale_counts function is as follows:
scale_counts <- function(rse, by = 'auc', targetSize = 4e7, L = 100, factor_only = FALSE, round = TRUE) { ... ## Scale counts if(by == 'auc') { # L cancels out: # have to multiply by L to get the desired library size, # but then divide by L to take into account the read length since the # raw counts are the sum of base-level coverage. scaleFactor <- targetSize / SummarizedExperiment::colData(rse)$auc ... scaleMat <- matrix(rep(scaleFactor, each = nrow(counts)), ncol = ncol(counts)) scaledCounts <- counts * scaleMat if(round) scaledCounts <- round(scaledCounts, 0) SummarizedExperiment::assay(rse, 1) <- scaledCounts return(rse) }
--------------------------------------------------------------------
First I though that auc is the library depth (sum of all read counts in each sample) but I get a different number. What is scaling by auc ? is it an alternative to normalization ?