Question

Comparing DEseq2 and Limma program, results not consistent

0

Entering edit mode

yuemolinn • 0

@yuemolinn-24233

Last seen 3.6 years ago

Hi,

Recently, I'm doing DE analysis on some RNA-seq materials. We have both raw count data matrix and also TPM data of all samples. At first, We performed DE analysis on DEseq2 using the raw count data, and we got some results. Then, to make sure the signals we found is correct . We also tried to run Limma by using its TPM data. (The TPM and raw Count data are coming from the same group of samples). But, at end of the day, by comparing the two results, we found that the two analysis was not consistent. We expect that, to some extent, for signals in DEseq2 analysis, they should also have low p-values. Or maybe they should be on the same scale (both are around at 10^-9 for example). Or at least we could find some similar patterns. However, there is a totally messy pattern for almost all models, consistent results can not be found. The most significant gene "GeneA"for one model in DEseq2 is 2.22E-14 but for the same model in Limma it's 1.94e-253 for "GeneB", p-value for GeneA(most significant in DEseq2) in Limma is 0.961. And the number of signals in Limma are way more than DEseq2. From the two software we got different conclusion.

So my question probably are:

Should we expect a similar pattern between Limma(TPM) and DEseq2(count)?
If so, what kind of problems may correspond to this inconsistency?
Why p-value between DEseq2 and Limma is so different?
What are other ways to double-check the results are correct?

Thanks!!

limma deseq2 Tutorial • 3.1k views

ADD COMMENT • link updated 3.6 years ago by Gordon Smyth 50k • written 3.6 years ago by yuemolinn • 0

1

Entering edit mode

I would never expect programs to agree on all genes, but I will permit that the developers of the packages in question provide more comprehensive answers if they choose. A couple of comments:

[A] for such a question, you really need to be posting the code that you used
[B] if you have raw counts, you should be using the limma - voom workflow, not limma directly on the TPM data. If you insist on using TPMs, then you could adopt the limma - trend workflow.

ADD REPLY • link 3.6 years ago Kevin Blighe ★ 3.9k

0

Entering edit mode

Thanks! Let me try using the limma-voom workflow on counts data first, see if the two are similar. Thank you for the advice!

ADD REPLY • link 3.6 years ago yuemolinn • 0

score 6 · Answer 1 · 2020-09-27

It is pretty much impossible for limma and DESe2 to disagree to the extent that you report, if used correctly, so we would have to conclude that you have made one or more mistakes in your analyses.

One mistake is using TPM, since limma is not designed to analyse TPM values. There could be any number of other mistakes but we can't tell because you haven't provided any code or information.

It isn't clear to me what you mean by having TPM values. TPM values are intended for transcripts, not for genes, whereas limma and DESeq2 analyse gene-level read counts. I don't know of a context in which it makes sense to have both TPM values and a matrix of raw read counts for the same genomic features.

Please have a read of the posting guide to see what is usually expected of a question on this forum. This forum is intended to help with code and users usually post the code that they want help with.

You ask how to make sure your results are correct. The only answer I can give you is to follow the examples, recommendations and documentation provided by the software tools.