Hi,
Recently, I'm doing DE analysis on some RNA-seq materials. We have both raw count data matrix and also TPM data of all samples. At first, We performed DE analysis on DEseq2 using the raw count data, and we got some results. Then, to make sure the signals we found is correct . We also tried to run Limma by using its TPM data. (The TPM and raw Count data are coming from the same group of samples). But, at end of the day, by comparing the two results, we found that the two analysis was not consistent. We expect that, to some extent, for signals in DEseq2 analysis, they should also have low p-values. Or maybe they should be on the same scale (both are around at 10^-9 for example). Or at least we could find some similar patterns. However, there is a totally messy pattern for almost all models, consistent results can not be found. The most significant gene "GeneA"for one model in DEseq2 is 2.22E-14 but for the same model in Limma it's 1.94e-253 for "GeneB", p-value for GeneA(most significant in DEseq2) in Limma is 0.961. And the number of signals in Limma are way more than DEseq2. From the two software we got different conclusion.
So my question probably are:
- Should we expect a similar pattern between Limma(TPM) and DEseq2(count)?
- If so, what kind of problems may correspond to this inconsistency?
- Why p-value between DEseq2 and Limma is so different?
- What are other ways to double-check the results are correct?
Thanks!!
I would never expect programs to agree on all genes, but I will permit that the developers of the packages in question provide more comprehensive answers if they choose. A couple of comments:
Thanks! Let me try using the limma-voom workflow on counts data first, see if the two are similar. Thank you for the advice!