Question: What does scaling RNASeq data using the area under coverage information (AUC) mean ? - related to recount package
13 months ago by
elmahy20050 wrote:

I am working with the package recount which download preprocessed RNASeq datasets. In the vinette, I am struggling to understand this :

"Downloaded count data are first scaled to take into account differing coverage between samples.

Scale counts by taking into account the total coverage per sample

rse1 <- scale_counts(rse_gene1)



The scale_counts function is as follows:

scale_counts <- function(rse, by = 'auc', targetSize = 4e7, L = 100,

    factor_only = FALSE, round = TRUE) {    


    ## Scale counts

    if(by == 'auc') {

        # L cancels out:

        # have to multiply by L to get the desired library size,

        # but then divide by L to take into account the read length since the

        # raw counts are the sum of base-level coverage.

        scaleFactor <- targetSize / SummarizedExperiment::colData(rse)$auc


        scaleMat <- matrix(rep(scaleFactor, each = nrow(counts)),

            ncol = ncol(counts))

        scaledCounts <- counts * scaleMat

        if(round) scaledCounts <- round(scaledCounts, 0)

        SummarizedExperiment::assay(rse, 1) <- scaledCounts





First I though that auc is the library depth (sum of all read counts in each sample) but I get a different number. What is scaling by auc ? is it an alternative to normalization ?


12 months ago by
United States
Leonardo Collado Torres690 wrote:


I don't know why I didn't get an email about this question. In any case, please check the recount workflow ( published at F1000 Research That workflow describes in more detail what are the actual numbers we provide in the RangedSummarizedExperiment objects. The scale_counts() function can be used to go from the numbers we provide to actual read counts.



