Question

DESeq2 step-by-step Normalization then DGE separately & Normalization methods and switchability

0

Entering edit mode

RL • 0

@d976f52f

Last seen 26 days ago

France

Hi everyone,

Sorry for the inconvenience, I was hoping that Michael Love in particular could help me. On the same dds DESeq2 object, I performed a DGEAnalysis with DESeq2, through 2 approachs :

One all-steps-in-one
```
dds <- DESeq(dds)
```
and one step-by-step dissected
```
dds <- estimateSizeFactors(dds)
# normcnt <- counts(dds, normalized=TRUE)
# write.csv(normcnt, file = "normcnt.csv", row.names = TRUE)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)
```
(even if longer, I prefer a step-by-step approach in my pipeline context mainly as you can see because I can easily extract the normalized table without redoing that computing).

But I do not obtain exactly the same result analysis. With step-by-step, Log2FoldChanges seem to be the same, but padj and others are not, for each gene, exactly the same, and I have very slightly less DE genes.

Is my step-by-step not strictly equivalent to DESeq() ? If so, do you see what to modify ?

And (if you have the time) another question, what is officially the default DESeq2 normalization method name, Means of Ratio Method, Size Factors Method ? And if others available in DESeq2 (I understand that), is there a list (I didn't find), with formulas, and how to set it in parameter ? For example :

dds <- estimateSizeFactors(dds, normalizationMethod="TMM")# ?

Thank you for your time and your help.

DESeq2 Normalization • 409 views

ADD COMMENT • link updated 26 days ago by Michael Love 41k • written 27 days ago by RL • 0

James W. MacDonald · Answer 1 · 2024-04-03

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 3 minutes ago

United States

I can't reproduce.

> z <- makeExampleDESeqDataSet()
> dds <- DESeq(z)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> dds2 <- estimateSizeFactors(z)
> dds2 <- estimateDispersions(dds2)
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
> dds2 <- nbinomWaldTest(dds2)
> all.equal(results(dds), results(dds2))
[1] TRUE
>

ADD COMMENT • link 27 days ago James W. MacDonald 65k

0

Entering edit mode

Thank you very much for your quick cross-check help.

I have redone, and sadly unchanged.

baseMean log2FoldChange lfcSE stat pvalue padj

257.3359592 -0.70613952 0.082681159 -8.540513101 1.34E-17 1.91E-13

1187.186642 -0.717965588 0.098872298 -7.261544441 3.83E-13 2.73E-09

82.09032957 3.941975841 0.551440677 7.148503914 8.77E-13 4.18E-09

688.0855115 0.60537706 0.090809498 6.666450939 2.62E-11 9.36E-08

636.4854582 -0.735240103 0.11340724 -6.483184884 8.98E-11 2.14E-07

...

614.5689018 -0.165047271 0.061945113 -2.664411485 0.007712316 0.485337531

1849.221224 0.122149957 0.045839434 2.664735273 0.007704895 0.485337531

25.17306252 0.598317756 0.224820501 2.661313146 0.007783653 0.487604891

4736.526392 -0.37558073 0.141207776 -2.659773703 0.007819317 0.487700014   






baseMean log2FoldChange lfcSE stat pvalue padj

257.3359592 -0.70613952 0.082681159 -8.540513101 1.34E-17 1.90E-13

1187.186642 -0.717965588 0.098872298 -7.261544441 3.83E-13 2.73E-09

82.09032957 3.941975841 0.551440677 7.148503914 8.77E-13 4.17E-09

688.0855115 0.60537706 0.090809498 6.666450939 2.62E-11 9.33E-08

636.4854582 -0.735240103 0.11340724 -6.483184884 8.98E-11 2.13E-07

...

614.5689018 -0.165047271 0.061945113 -2.664411485 0.007712316 0.499377638

1849.221224 0.122149957 0.045839434 2.664735273 0.007704895 0.499377638  

25.17306252 0.598317756 0.224820501 2.661313146 0.007783653 0.501640584

4736.526392 -0.37558073 0.141207776 -2.659773703 0.007819317 0.501669056

So it is

either, anomaly in my dataset in comparison to internal example dataset makeExampleDESeqDataSet(),

or, an influence of the DESeq matrix object definition before the following DGE analysis (even if I strictly used the same counts table),

 dds <- DESeqDataSetFromMatrix(countData = counts,
                               colData = coldata,
                               design = ~ condition)

(or, mystery ...).

ADD REPLY • link updated 26 days ago by James W. MacDonald 65k • written 26 days ago by RL • 0

0

Entering edit mode

The only difference I can see is in the adjusted p-values, and those will only change (given the same input p-values) if the number of tests vary.

ADD REPLY • link 26 days ago James W. MacDonald 65k

1

Entering edit mode

DESeq() also does refitting of outliers. Try minRep=Inf to turn that off.

ADD REPLY • link 26 days ago Michael Love 41k

0

Entering edit mode

Wonderful ! It works. So :

dds <- DESeq(dds, minRep=Inf)

I suppose that if you configured by default DESeq() with that minRep as not Inf, it is a good thing and must be kept, so, to keep this default setting, as you said elsewhere, "[DESeq()] uses the pre-existing sizeFactors(dds) however they were created (or simply assigned).", I can simply do

dds <- estimateSizeFactors(dds)
# Here eventually extract by-DESeq2 normalized table
dds <- DESeq(dds)# (automatically not redoing estimateSizeFactors)
# Here extract DGE results

?

Is it possible to set that minRep= not Inf manually as a step (function or parameter), in this context :

dds <- estimateSizeFactors(dds)
# Export normalized table
# ...
# Import normalized table and perform DGE analysis with minRep= NOT Inf by :
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)

(not found in documented parameters of these functions) ?

Thank you very much for your help, I m a student so sorry if too evident question, I will avoid to bother you more.

Have a good day.

ADD REPLY • link 26 days ago RL • 0

0

Entering edit mode

The thing it is doing is not part of the sub functions.

I don't love the outlier replacement but we leave it there for consistency. I often turn it off myself for large datasets and use manual inspection.

ADD REPLY • link 26 days ago Michael Love 41k