2.0 years ago by
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
There are a number of reasons why the different packages will give different fold change estimates. It's pretty clear what the culprit is in your case, but I'll list all the possible reasons before I explain that:
- Normalization. You have to apply the same normalization method and fit the same model in each package, otherwise the different packages will be estimating different things.
- Arithmetic. There is a arithmetic difference between the linear model fold changes computed by limma vs the negative binomial generalized linear model fold changes computed by edgeR and DESeq2. This is the effect that you asked about and which Aaron has discussed. As Aaron explained, the differences are usually either small or not important.
- Shrinkage. None of the three packages report "raw" fold changes (unless you ask them to). All three packages instead shrink the log fold changes towards zero and the default amount of shrinkage is different in the three packages. The defaults in limma and edgeR are similar, with limma doing a bit more shrinkage than edgeR. The shrinkage that edgeR does is user-settable although I assume you have used the default. DESeq2 might do a lot more shrinkage if you ask it to estimate the amount of shrinkage to do. This is the effect that was discussed in the previous question & answer that you gave the link to.
- Dispersion. All the packages estimate the dispersions somewhat differently and this also has an influence.
The most important factors, the ones that can potentially give big differences, are (1) and (3). In your case it appears that the culprit is (1). Your log fold changes from limma are not shrunk (closer to zero) compared to edgeR and DESeq2, but rather are substantially shifted (more negative, with smaller positive values and larger negative values). Such a shift in the logFCs can only occur because you have normalized differently or fitted a different model in limma compared to the other two packages.
In summary, the differences you see are not intrinsic to the packages but have almost certainly resulted from the way that you used the packages.
I don't think the differences are due to the two formulas you give. The computations done by the three packages are actually more complicated than these formulas and none of the three packages uses the formula that you give as the "actual" logFC.
People often make the mistake of posting on this support site results that they got from different packages and asking why the numbers are different. This overlooks the fact that the packages are not canned analyses but rather flexible pipelines with lots of options. So it isn't meaningful to give results from a package without explaining the details of how the package was used. So you have to give all the steps and options chosen (e.g., by posting detailed code) leading up to the results that are presented.