Question: What does scaling RNASeq data using the area under coverage information (AUC) mean ? - related to recount package
0
gravatar for elmahy2005
11 months ago by
elmahy20050
elmahy20050 wrote:

I am working with the package recount which download preprocessed RNASeq datasets. In the vinette, I am struggling to understand this :

"Downloaded count data are first scaled to take into account differing coverage between samples.

Scale counts by taking into account the total coverage per sample

rse1 <- scale_counts(rse_gene1)

"

-----------------------------------------------------------------------

The scale_counts function is as follows:

scale_counts <- function(rse, by = 'auc', targetSize = 4e7, L = 100,

    factor_only = FALSE, round = TRUE) {    

...

    ## Scale counts

    if(by == 'auc') {

        # L cancels out:

        # have to multiply by L to get the desired library size,

        # but then divide by L to take into account the read length since the

        # raw counts are the sum of base-level coverage.

        scaleFactor <- targetSize / SummarizedExperiment::colData(rse)$auc

    ...

        scaleMat <- matrix(rep(scaleFactor, each = nrow(counts)),

            ncol = ncol(counts))

        scaledCounts <- counts * scaleMat

        if(round) scaledCounts <- round(scaledCounts, 0)

        SummarizedExperiment::assay(rse, 1) <- scaledCounts

        return(rse)

    }

 

--------------------------------------------------------------------

First I though that auc is the library depth (sum of all read counts in each sample) but I get a different number. What is scaling by auc ? is it an alternative to normalization ?

 

ADD COMMENTlink modified 10 months ago by Leonardo Collado Torres640 • written 11 months ago by elmahy20050
Answer: What does scaling RNASeq data using the area under coverage information (AUC) me
0
gravatar for Leonardo Collado Torres
10 months ago by
United States
Leonardo Collado Torres640 wrote:

Hi,

I don't know why I didn't get an email about this question. In any case, please check the recount workflow (http://bioconductor.org/packages/release/workflows/html/recountWorkflow.html) published at F1000 Research https://f1000research.com/articles/6-1558/v1. That workflow describes in more detail what are the actual numbers we provide in the RangedSummarizedExperiment objects. The scale_counts() function can be used to go from the numbers we provide to actual read counts.

Best,

Leonardo

ADD COMMENTlink written 10 months ago by Leonardo Collado Torres640
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 144 users visited in the last hour