DE-Seq2 requires normalized counts. Suppose you have a public dataset where only normalized count data, for example TPM, is available. In that case would it be at all correct to probe the normalized gene counts between conditions using a statistical test? And what would be the reasoning to either do or not do it? I don't think it is correct but just want to double-check with the community as in the literature I've seen several studies where normalized counts are plotted and compared between conditions (outside of DESEq2 or another DE method), despite documentation of RNA-seq methods advising against it.
Maybe I have not formulated my question precisely enough as I have no intention to run DE-Seq on anything else but raw counts, but could you point me to a suitable method to analyze TPMs?
I think the OP meant to type "DE-Seq2 requires raw counts" in the first sentence. I have the same question about running basic statistical tests on counts that have NOT been normalized using DESeq2 (which performs its own stats), but have been normalized using some other method. I've seen some recent papers in which the authors plot "normalized expression" (normalized counts as log2TPM or something similar) and then run t-tests to determine significance. Is this correct/the best way to do this? If not, how do you determine if gene expression is significantly different using normalized transcript counts (i.e. data downloaded from the TCGA or data deposited as TPMs or CPMs, which are apparently not appropriate for differential expression analysis https://hbctraining.github.io/DGE_workshop/lessons/02_DGE_count_normalization.html)
Here is an example: https://www.nature.com/articles/s41598-021-85993-x/figures/2 from https://www.nature.com/articles/s41598-021-85993-x
I don't have any recommendations for you here.
I really recommend using counts and one of the methods in Bioconductor that takes count precision into account (DESeq2, edgeR, limma-voom, etc.). It does make a difference.
"log2TPM or something similar and then run t-tests to determine significance. Is this correct/the best way to do this?"
Don't recommend this.
Thank you!
Thanks for your help!