DESeq2: which normalized data matrix should I take?
3
0
Entering edit mode
Xiaokuan Wei ▴ 230
@xiaokuan-wei-4052
Last seen 8.4 years ago
United States

Hi,

 

I want to extract the normalized the data matrix (reads matrix) to do differential gene expression analysis by myself instead of using wald test the package provided as I don't have replicates in each comparison group. so, there is no statistical calculation but only fold change between two samples. I am going to use vsn normalized matrix to do this work. My question is what is the advantages and drawbacks of using normalized data matrix instead of using the raw counts? What datamatrix (count) should be appropriate for this type analysis.

Thank you

 

-W

deseq2 • 4.1k views
ADD COMMENT
3
Entering edit mode
@ryan-c-thompson-5618
Last seen 7 weeks ago
Icahn School of Medicine at Mount Sinai…

If you're just going to be doing a descriptive analysis using fold changes, then you probably want to just do variance stabilization using the regularized log transformation in DESeq2, which will give you normalized and variance-stabilized counts-per-million. See the help page for the rlog function. Since you have no replicates, you'll need to use blind=TRUE. You can also use varianceStabilizingTransformation for similar purposes, but it is more sensitive to differences in sequencing depth and sample complexity.

The primary effect of variance stabilization in RNA-seq data is to reduce the magnitude of fold changes for low-count genes. This counteracts the tendency of low-count genes to have very large fold changes (and high variance) since small random variations in low-count genes are larger relative to the counts themselves. You can see an example of how the regularization affects the data here: http://www.sthda.com/english/wiki/rna-seq-differential-expression-work-flow-using-deseq2#the-rlog-transform

ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 4 days ago
United States

In addition to Ryan's suggestions, note that you can just run DESeq() to calculate fold changes. It will detect there are no replicates, and automatically perform "blind" dispersion estimation by treating the different samples as replicates (it will print a warning that this was done). At the results stage, you can use the moderated fold changes (log2FoldChange column) or if you set addMLE=TRUE to results(), you can compare the moderated fold changes to the MLE (maximum likelihood estimate / unmoderated) fold changes.

ADD COMMENT
0
Entering edit mode
Xiaokuan Wei ▴ 230
@xiaokuan-wei-4052
Last seen 8.4 years ago
United States

Ryan, Michael:

Thank you for your informative answers to my question. I just have another question regarding this process.

As to my understanding, the fold changes obtained from DESeq() is using raw counts instead of normalized one (rlog or vsn).

So, if I extracted normalized matrix then do the fold changes calculation between two samples, the results will be slightly different from the fold changes obtained by DESeq(). Is this right?

Thank you.

 

-W

ADD COMMENT
1
Entering edit mode

The fold changes calculated by DESeq(), either the moderated (default) or unmoderated fold changes (using addMLE or betaPrior=FALSE), will not be the same as the ones obtained from rlog or VST data. The moderated fold changes are calculated as described in the paper. The unmoderated fold changes in a simple group comparison are equal to (if you allow some pseudo latex):

mean_{j in group B}(K_ij / s_j) / mean_{j in group A}(K_ij / s_j)

ADD REPLY
0
Entering edit mode

Got it, I think so. Thank you Michael. -W

ADD REPLY

Login before adding your answer.

Traffic: 529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6