What does scaling RNASeq data using the area under coverage information (AUC) mean ? - related to recount package
1
0
Entering edit mode
elmahy2005 • 0
@elmahy2005-17126
Last seen 2.6 years ago

I am working with the package recount which download preprocessed RNASeq datasets. In the vinette, I am struggling to understand this :

"Downloaded count data are first scaled to take into account differing coverage between samples.

Scale counts by taking into account the total coverage per sample

rse1 <- scale_counts(rse_gene1)

"

-----------------------------------------------------------------------

The scale_counts function is as follows:

scale_counts <- function(rse, by = 'auc', targetSize = 4e7, L = 100,

    factor_only = FALSE, round = TRUE) {    

...

    ## Scale counts

    if(by == 'auc') {

        # L cancels out:

        # have to multiply by L to get the desired library size,

        # but then divide by L to take into account the read length since the

        # raw counts are the sum of base-level coverage.

        scaleFactor <- targetSize / SummarizedExperiment::colData(rse)$auc

    ...

        scaleMat <- matrix(rep(scaleFactor, each = nrow(counts)),

            ncol = ncol(counts))

        scaledCounts <- counts * scaleMat

        if(round) scaledCounts <- round(scaledCounts, 0)

        SummarizedExperiment::assay(rse, 1) <- scaledCounts

        return(rse)

    }

 

--------------------------------------------------------------------

First I though that auc is the library depth (sum of all read counts in each sample) but I get a different number. What is scaling by auc ? is it an alternative to normalization ?

 

library size factor recount • 366 views
ADD COMMENT
0
Entering edit mode
@lcolladotor
Last seen 12 days ago
United States

Hi,

I don't know why I didn't get an email about this question. In any case, please check the recount workflow (http://bioconductor.org/packages/release/workflows/html/recountWorkflow.html) published at F1000 Research https://f1000research.com/articles/6-1558/v1. That workflow describes in more detail what are the actual numbers we provide in the RangedSummarizedExperiment objects. The scale_counts() function can be used to go from the numbers we provide to actual read counts.

Best,

Leonardo

ADD COMMENT

Login before adding your answer.

Traffic: 488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6