Question

When running contrasts, does it use normalized read counts?

0

Entering edit mode

Katherine • 0

@5b822ad9

Last seen 5 months ago

Canada

I want to check contrasts between certain samples in my dataset, and I was told that for DESeq2 we need to use raw data that is normalized. I don't have real data to practice this on but I have been going through the code snippets I obtained trying to understand it better. However, I am confused by one thing here.

I've read that DESwq2 is to be used on unnormalized data so it can properly estimate size factors. So was the advice I was told earlier where it should not be used on raw data incorrect?
In any case, DESeq2 will normalize the data afterwards (so this leads me to think that for my first question, data needs to be raw counts, but when running contrasts (which is what I will be doing), I see the code does not use the normalized counts dataframe, but instead calls dds which isn't normalized yet. The code is below. Keeping in mind I will be using this for species data which tends to be very 0-inflated.

    dds <- DESeqDataSetFromMatrix(countData = some.counts.data, 
                                      colData = some.env.coldata, 
                                      design = ~ condition)

    dds <- estimateSizeFactors(dds, type="poscounts") # for 0-inflated data use poscounts
    dds <- estimateDispersions(dds)
    dds <- nbinomWaldTest(dds)
    dds
    normalized_counts <- counts(dds, normalized=TRUE)  # can extract this as a dataframe later on

    #Run below when comparing between groups of interest
    contrast1 <- as.data.frame(results(dds, contrast=c("condition", "CONDITION1", "CONTROL")))

When running the contrasts and calling dds, the counts are not normalized at this point right? So would this be incorrect? Shouldn't contrasts be done after everything is normalized so it's properly compared?

DESeq2 • 383 views

ADD COMMENT • link 5 months ago Katherine • 0

score 1 · Answer 1 · 2024-10-14

A contrast is a comparison of two or more coefficients. When you run nbiomWaldTest you are fitting the GLM and estimating coefficients. The results function simply extracts whatever contrast you are interested in. At that point you are just using the model coefficients to make the comparisons.

Also, there is no 'normalization' of the data in a GLM, but instead the size factors are used as an offset to control for the library size (normalized counts are not used for anything except maybe for plotting, but in that case you might want to use vst or rlog values).

In other words, estimateSizeFactors computes library sizes to use as offsets in the GLM, estimateDispersions estimates the by-gene dispersion estimates, and then nbinomWaldTest fits the GLM and estimates the model coefficients. And then results computes any contrast you might care about and puts into a DataFrame that you can then inspect. There is no normalization of counts or anything like that.