Question

What does scaling RNASeq data using the area under coverage information (AUC) mean ? - related to recount package

0

Entering edit mode

elmahy2005 • 0

@elmahy2005-17126

Last seen 6.8 years ago

I am working with the package recount which download preprocessed RNASeq datasets. In the vinette, I am struggling to understand this :

"Downloaded count data are first scaled to take into account differing coverage between samples.

Scale counts by taking into account the total coverage per sample

rse1 <- scale_counts(rse_gene1)

"

-----------------------------------------------------------------------

The scale_counts function is as follows:

scale_counts <- function(rse, by = 'auc', targetSize = 4e7, L = 100,

    factor_only = FALSE, round = TRUE) {    

...

    ## Scale counts

    if(by == 'auc') {

        # L cancels out:

        # have to multiply by L to get the desired library size,

        # but then divide by L to take into account the read length since the

        # raw counts are the sum of base-level coverage.

        scaleFactor <- targetSize / SummarizedExperiment::colData(rse)$auc

    ...

        scaleMat <- matrix(rep(scaleFactor, each = nrow(counts)),

            ncol = ncol(counts))

        scaledCounts <- counts * scaleMat

        if(round) scaledCounts <- round(scaledCounts, 0)

        SummarizedExperiment::assay(rse, 1) <- scaledCounts

        return(rse)

    }

--------------------------------------------------------------------

First I though that auc is the library depth (sum of all read counts in each sample) but I get a different number. What is scaling by auc ? is it an alternative to normalization ?

library size factor recount • 1.3k views

ADD COMMENT • link updated 6.8 years ago by Leonardo Collado Torres ★ 1.1k • written 6.8 years ago by elmahy2005 • 0

score 0 · Answer 1 · 2018-09-28

Hi,

I don't know why I didn't get an email about this question. In any case, please check the recount workflow (http://bioconductor.org/packages/release/workflows/html/recountWorkflow.html) published at F1000 Research https://f1000research.com/articles/6-1558/v1. That workflow describes in more detail what are the actual numbers we provide in the RangedSummarizedExperiment objects. The scale_counts() function can be used to go from the numbers we provide to actual read counts.

Best,

Leonardo