kegga() p-value calculated differently in different edgeR versions
2
0
Entering edit mode
cronanz • 0
@cronanz-12047
Last seen 5.6 years ago

Hi,

I was wondering whether anyone could shed light on this discrepancy I observed between two different edgeR versions.

I used a version 3.12.1 of edgeR and did all my gene set enrichment analysis with it, but recently I updated the edgeR version to 3.18 and realised that the p-values that were generated from using kegga() are different when v3.18 was used compared to when v3.12 was used. 

It looks like the p-values were lower when v3.12 was used compared to v3.18, but the p-values from v3.18 seems to be closer to the actual p-values from fisher's exact test. 

There was no difference in the input data and the parameters. Only difference is the edgeR version. 

Any ideas (from the authors)? Thank you.

edger kegga • 1.6k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 2 hours ago
The city by the bay

There hasn't been any change in the edgeR's kegga wrappers in the past few years. However, there have been some small changes in the limma kegga code on which edgeR relies. For example, reading the NEWS for the latest limma version indicates that there was a bug fix to kegga, which may contribute to the changes in results. Also, the gene sets are downloaded from the KEGG website, which gets updated for new genes, etc. between releases. It's hard to know the exact cause without doing some forensic bioinformatics; if you must find out, you'll have to install the old versions of the packages and try to recover the original results. (This is best done on a separate R installation, otherwise you'll play havoc with the old and new versions.)

ADD COMMENT
0
Entering edit mode

The kegga bug fix in the latest version of limma doesn't affect the usual analysis pipelines. It only comes into play if OP has explicitly specified the gene universe to kegga (which he or she is very unlikely to have done) and the universe contains duplicates.

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

Actually there hasn't been any important change to kegga() in either edgeR or limma. However kegga() reads the KEGG pathways directly from the KEGG website every time you run it. If you run kegga() two years apart, as you seem to have done here, then the KEGG pathways themselves are likely to have changed over that time, and the results will change because of that.

You also need to check that the differential expression results you are feeding to kegga() haven't changed in the past 2 years.

Assuming that you are not asking kegga() to make a length bias correction, then the p-values from kegga() agree exactly with Fisher's exact test, and this has been true of all versions of edgeR and limma. If you don't agree with this, then you need to provide an example!

ADD COMMENT

Login before adding your answer.

Traffic: 540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6