Question: DESeq2: which normalized data matrix should I take?
0
4.6 years ago by
Xiaokuan Wei230
United States
Xiaokuan Wei230 wrote:

Hi,

I want to extract the normalized the data matrix (reads matrix) to do differential gene expression analysis by myself instead of using wald test the package provided as I don't have replicates in each comparison group. so, there is no statistical calculation but only fold change between two samples. I am going to use vsn normalized matrix to do this work. My question is what is the advantages and drawbacks of using normalized data matrix instead of using the raw counts? What datamatrix (count) should be appropriate for this type analysis.

Thank you

-W

deseq2 • 3.0k views
modified 4.6 years ago • written 4.6 years ago by Xiaokuan Wei230
Answer: DESeq2: which normalized data matrix should I take?
3
4.6 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.4k wrote:

If you're just going to be doing a descriptive analysis using fold changes, then you probably want to just do variance stabilization using the regularized log transformation in DESeq2, which will give you normalized and variance-stabilized counts-per-million. See the help page for the rlog function. Since you have no replicates, you'll need to use blind=TRUE. You can also use varianceStabilizingTransformation for similar purposes, but it is more sensitive to differences in sequencing depth and sample complexity.

The primary effect of variance stabilization in RNA-seq data is to reduce the magnitude of fold changes for low-count genes. This counteracts the tendency of low-count genes to have very large fold changes (and high variance) since small random variations in low-count genes are larger relative to the counts themselves. You can see an example of how the regularization affects the data here: http://www.sthda.com/english/wiki/rna-seq-differential-expression-work-flow-using-deseq2#the-rlog-transform

Answer: DESeq2: which normalized data matrix should I take?
2
4.6 years ago by
Michael Love25k
United States
Michael Love25k wrote:

In addition to Ryan's suggestions, note that you can just run DESeq() to calculate fold changes. It will detect there are no replicates, and automatically perform "blind" dispersion estimation by treating the different samples as replicates (it will print a warning that this was done). At the results stage, you can use the moderated fold changes (log2FoldChange column) or if you set addMLE=TRUE to results(), you can compare the moderated fold changes to the MLE (maximum likelihood estimate / unmoderated) fold changes.

Answer: DESeq2: which normalized data matrix should I take?
0
4.6 years ago by
Xiaokuan Wei230
United States
Xiaokuan Wei230 wrote:

Ryan, Michael:

Thank you for your informative answers to my question. I just have another question regarding this process.

As to my understanding, the fold changes obtained from DESeq() is using raw counts instead of normalized one (rlog or vsn).

So, if I extracted normalized matrix then do the fold changes calculation between two samples, the results will be slightly different from the fold changes obtained by DESeq(). Is this right?

Thank you.

-W

1

The fold changes calculated by DESeq(), either the moderated (default) or unmoderated fold changes (using addMLE or betaPrior=FALSE), will not be the same as the ones obtained from rlog or VST data. The moderated fold changes are calculated as described in the paper. The unmoderated fold changes in a simple group comparison are equal to (if you allow some pseudo latex):

mean_{j in group B}(K_ij / s_j) / mean_{j in group A}(K_ij / s_j)