Question: kegga() p-value calculated differently in different edgeR versions
0
gravatar for cronanz
2.4 years ago by
cronanz0
cronanz0 wrote:

Hi,

I was wondering whether anyone could shed light on this discrepancy I observed between two different edgeR versions.

I used a version 3.12.1 of edgeR and did all my gene set enrichment analysis with it, but recently I updated the edgeR version to 3.18 and realised that the p-values that were generated from using kegga() are different when v3.18 was used compared to when v3.12 was used. 

It looks like the p-values were lower when v3.12 was used compared to v3.18, but the p-values from v3.18 seems to be closer to the actual p-values from fisher's exact test. 

There was no difference in the input data and the parameters. Only difference is the edgeR version. 

Any ideas (from the authors)? Thank you.

edger kegga • 551 views
ADD COMMENTlink modified 2.4 years ago by Gordon Smyth39k • written 2.4 years ago by cronanz0
Answer: kegga() p-value calculated differently in different edgeR versions
0
gravatar for Aaron Lun
2.4 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

There hasn't been any change in the edgeR's kegga wrappers in the past few years. However, there have been some small changes in the limma kegga code on which edgeR relies. For example, reading the NEWS for the latest limma version indicates that there was a bug fix to kegga, which may contribute to the changes in results. Also, the gene sets are downloaded from the KEGG website, which gets updated for new genes, etc. between releases. It's hard to know the exact cause without doing some forensic bioinformatics; if you must find out, you'll have to install the old versions of the packages and try to recover the original results. (This is best done on a separate R installation, otherwise you'll play havoc with the old and new versions.)

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Aaron Lun25k

The kegga bug fix in the latest version of limma doesn't affect the usual analysis pipelines. It only comes into play if OP has explicitly specified the gene universe to kegga (which he or she is very unlikely to have done) and the universe contains duplicates.

ADD REPLYlink written 2.4 years ago by Gordon Smyth39k
Answer: kegga() p-value calculated differently in different edgeR versions
0
gravatar for Gordon Smyth
2.4 years ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

Actually there hasn't been any important change to kegga() in either edgeR or limma. However kegga() reads the KEGG pathways directly from the KEGG website every time you run it. If you run kegga() two years apart, as you seem to have done here, then the KEGG pathways themselves are likely to have changed over that time, and the results will change because of that.

You also need to check that the differential expression results you are feeding to kegga() haven't changed in the past 2 years.

Assuming that you are not asking kegga() to make a length bias correction, then the p-values from kegga() agree exactly with Fisher's exact test, and this has been true of all versions of edgeR and limma. If you don't agree with this, then you need to provide an example!

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by Gordon Smyth39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 281 users visited in the last hour