Question

kegga() p-value calculated differently in different edgeR versions

0

Entering edit mode

cronanz • 0

@cronanz-12047

Last seen 6.6 years ago

Hi,

I was wondering whether anyone could shed light on this discrepancy I observed between two different edgeR versions.

I used a version 3.12.1 of edgeR and did all my gene set enrichment analysis with it, but recently I updated the edgeR version to 3.18 and realised that the p-values that were generated from using kegga() are different when v3.18 was used compared to when v3.12 was used.

It looks like the p-values were lower when v3.12 was used compared to v3.18, but the p-values from v3.18 seems to be closer to the actual p-values from fisher's exact test.

There was no difference in the input data and the parameters. Only difference is the edgeR version.

Any ideas (from the authors)? Thank you.

edger kegga • 1.8k views

ADD COMMENT • link updated 8.4 years ago by Gordon Smyth 53k • written 8.5 years ago by cronanz • 0

score 0 · Answer 1 · 2017-06-26

There hasn't been any change in the edgeR's kegga wrappers in the past few years. However, there have been some small changes in the limma kegga code on which edgeR relies. For example, reading the NEWS for the latest limma version indicates that there was a bug fix to kegga, which may contribute to the changes in results. Also, the gene sets are downloaded from the KEGG website, which gets updated for new genes, etc. between releases. It's hard to know the exact cause without doing some forensic bioinformatics; if you must find out, you'll have to install the old versions of the packages and try to recover the original results. (This is best done on a separate R installation, otherwise you'll play havoc with the old and new versions.)

score 0 · Answer 2 · 2017-06-27

Actually there hasn't been any important change to kegga() in either edgeR or limma. However kegga() reads the KEGG pathways directly from the KEGG website every time you run it. If you run kegga() two years apart, as you seem to have done here, then the KEGG pathways themselves are likely to have changed over that time, and the results will change because of that.

You also need to check that the differential expression results you are feeding to kegga() haven't changed in the past 2 years.

Assuming that you are not asking kegga() to make a length bias correction, then the p-values from kegga() agree exactly with Fisher's exact test, and this has been true of all versions of edgeR and limma. If you don't agree with this, then you need to provide an example!