Search
Question: DESeq2: which normalized data matrix should I take?
0
gravatar for Xiaokuan Wei
2.2 years ago by
Xiaokuan Wei230
United States
Xiaokuan Wei230 wrote:

Hi,

 

I want to extract the normalized the data matrix (reads matrix) to do differential gene expression analysis by myself instead of using wald test the package provided as I don't have replicates in each comparison group. so, there is no statistical calculation but only fold change between two samples. I am going to use vsn normalized matrix to do this work. My question is what is the advantages and drawbacks of using normalized data matrix instead of using the raw counts? What datamatrix (count) should be appropriate for this type analysis.

Thank you

 

-W

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Xiaokuan Wei230
2
gravatar for Ryan C. Thompson
2.2 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson5.7k wrote:

If you're just going to be doing a descriptive analysis using fold changes, then you probably want to just do variance stabilization using the regularized log transformation in DESeq2, which will give you normalized and variance-stabilized counts-per-million. See the help page for the rlog function. Since you have no replicates, you'll need to use blind=TRUE. You can also use varianceStabilizingTransformation for similar purposes, but it is more sensitive to differences in sequencing depth and sample complexity.

The primary effect of variance stabilization in RNA-seq data is to reduce the magnitude of fold changes for low-count genes. This counteracts the tendency of low-count genes to have very large fold changes (and high variance) since small random variations in low-count genes are larger relative to the counts themselves. You can see an example of how the regularization affects the data here: http://www.sthda.com/english/wiki/rna-seq-differential-expression-work-flow-using-deseq2#the-rlog-transform

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Ryan C. Thompson5.7k
2
gravatar for Michael Love
2.2 years ago by
Michael Love12k
United States
Michael Love12k wrote:

In addition to Ryan's suggestions, note that you can just run DESeq() to calculate fold changes. It will detect there are no replicates, and automatically perform "blind" dispersion estimation by treating the different samples as replicates (it will print a warning that this was done). At the results stage, you can use the moderated fold changes (log2FoldChange column) or if you set addMLE=TRUE to results(), you can compare the moderated fold changes to the MLE (maximum likelihood estimate / unmoderated) fold changes.

ADD COMMENTlink written 2.2 years ago by Michael Love12k
0
gravatar for Xiaokuan Wei
2.2 years ago by
Xiaokuan Wei230
United States
Xiaokuan Wei230 wrote:

Ryan, Michael:

Thank you for your informative answers to my question. I just have another question regarding this process.

As to my understanding, the fold changes obtained from DESeq() is using raw counts instead of normalized one (rlog or vsn).

So, if I extracted normalized matrix then do the fold changes calculation between two samples, the results will be slightly different from the fold changes obtained by DESeq(). Is this right?

Thank you.

 

-W

ADD COMMENTlink written 2.2 years ago by Xiaokuan Wei230
1

The fold changes calculated by DESeq(), either the moderated (default) or unmoderated fold changes (using addMLE or betaPrior=FALSE), will not be the same as the ones obtained from rlog or VST data. The moderated fold changes are calculated as described in the paper. The unmoderated fold changes in a simple group comparison are equal to (if you allow some pseudo latex):

mean_{j in group B}(K_ij / s_j) / mean_{j in group A}(K_ij / s_j)

ADD REPLYlink written 2.2 years ago by Michael Love12k

Got it, I think so. Thank you Michael. -W

ADD REPLYlink written 2.2 years ago by Xiaokuan Wei230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 161 users visited in the last hour