Question: kegga() p-value calculated differently in different edgeR versions
0
2.2 years ago by
cronanz0
cronanz0 wrote:

Hi,

I was wondering whether anyone could shed light on this discrepancy I observed between two different edgeR versions.

I used a version 3.12.1 of edgeR and did all my gene set enrichment analysis with it, but recently I updated the edgeR version to 3.18 and realised that the p-values that were generated from using kegga() are different when v3.18 was used compared to when v3.12 was used.

It looks like the p-values were lower when v3.12 was used compared to v3.18, but the p-values from v3.18 seems to be closer to the actual p-values from fisher's exact test.

There was no difference in the input data and the parameters. Only difference is the edgeR version.

Any ideas (from the authors)? Thank you.

edger kegga • 518 views
modified 2.2 years ago by Gordon Smyth38k • written 2.2 years ago by cronanz0
Answer: kegga() p-value calculated differently in different edgeR versions
0
2.2 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

There hasn't been any change in the edgeR's kegga wrappers in the past few years. However, there have been some small changes in the limma kegga code on which edgeR relies. For example, reading the NEWS for the latest limma version indicates that there was a bug fix to kegga, which may contribute to the changes in results. Also, the gene sets are downloaded from the KEGG website, which gets updated for new genes, etc. between releases. It's hard to know the exact cause without doing some forensic bioinformatics; if you must find out, you'll have to install the old versions of the packages and try to recover the original results. (This is best done on a separate R installation, otherwise you'll play havoc with the old and new versions.)

The kegga bug fix in the latest version of limma doesn't affect the usual analysis pipelines. It only comes into play if OP has explicitly specified the gene universe to kegga (which he or she is very unlikely to have done) and the universe contains duplicates.

Answer: kegga() p-value calculated differently in different edgeR versions
0
2.2 years ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

Actually there hasn't been any important change to kegga() in either edgeR or limma. However kegga() reads the KEGG pathways directly from the KEGG website every time you run it. If you run kegga() two years apart, as you seem to have done here, then the KEGG pathways themselves are likely to have changed over that time, and the results will change because of that.

You also need to check that the differential expression results you are feeding to kegga() haven't changed in the past 2 years.

Assuming that you are not asking kegga() to make a length bias correction, then the p-values from kegga() agree exactly with Fisher's exact test, and this has been true of all versions of edgeR and limma. If you don't agree with this, then you need to provide an example!