Statistics on normalized counts
1
0
Entering edit mode
Justin • 0
@88cc769a
Last seen 7 days ago
United States

DE-Seq2 requires normalized counts. Suppose you have a public dataset where only normalized count data, for example TPM, is available. In that case would it be at all correct to probe the normalized gene counts between conditions using a statistical test? And what would be the reasoning to either do or not do it? I don't think it is correct but just want to double-check with the community as in the literature I've seen several studies where normalized counts are plotted and compared between conditions (outside of DESEq2 or another DE method), despite documentation of RNA-seq methods advising against it.

DESeq2 • 178 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 5 hours ago
United States

DESeq2 requires original counts, and should not be run on normalized (scaled) counts. This has been discussed numerous times on the support site already. The counts provide information about precision of the measurements.

If you only have TPM and no access to the counts or the sequencing depth, go with a different method.

ADD COMMENT
0
Entering edit mode

Maybe I have not formulated my question precisely enough as I have no intention to run DE-Seq on anything else but raw counts, but could you point me to a suitable method to analyze TPMs?

ADD REPLY
0
Entering edit mode

I think the OP meant to type "DE-Seq2 requires raw counts" in the first sentence. I have the same question about running basic statistical tests on counts that have NOT been normalized using DESeq2 (which performs its own stats), but have been normalized using some other method. I've seen some recent papers in which the authors plot "normalized expression" (normalized counts as log2TPM or something similar) and then run t-tests to determine significance. Is this correct/the best way to do this? If not, how do you determine if gene expression is significantly different using normalized transcript counts (i.e. data downloaded from the TCGA or data deposited as TPMs or CPMs, which are apparently not appropriate for differential expression analysis https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html)

Here is an example: https://www.nature.com/articles/s41598-021-85993-x/figures/2 from https://www.nature.com/articles/s41598-021-85993-x

ADD REPLY
1
Entering edit mode

I don't have any recommendations for you here.

I really recommend using counts and one of the methods in Bioconductor that takes count precision into account (DESeq2, edgeR, limma-voom, etc.). It does make a difference.

"log2TPM or something similar and then run t-tests to determine significance. Is this correct/the best way to do this?"

Don't recommend this.

ADD REPLY
0
Entering edit mode

Thank you!

ADD REPLY
0
Entering edit mode

Thanks for your help!

ADD REPLY

Login before adding your answer.

Traffic: 251 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6