Hi,
I am doing pathway analysis with clusterProfiler using the enrichGO function, and I am looking at the Biological Process results.
Since one of my co-worker told me about Enrichr (https://maayanlab.cloud/Enrichr/), I wanted to compare the results between both methods. If you are not familiar with Enrichr, the GO pathways results can be found (after submitting your set of genes) in the "Ontologies" tab and then the first result is "GO Biological Process 2023".
So I have different results between both, even if I set the same universe for the two methods (on Enrichr you can specify the background by clicking on the red "background" word on top of the genes box). Every pathway in Enrichr seems to be in the enrichGO table result, but the converse is not true. Some of my most significant pathways in the enrichGO results are not in the Enrichr results. Also, if a pathway is in both results, the number of genes involved in that pathway is greater in the clusterProfiler result than it is in the Enrichr result.
Since Enrichr is specifying "2023" in the results title, I was wondering if clusterProfiler was using another GO database (older or maybe more recent), but I read that clusterProfiler is using the GO.db library and when running GO.db on R, it tells me that the GO source has been set on 2023 July 27th, so I'm guessing it should be the same database in both method.
So why is there a difference between these both methods ?
Thank you in advance for your answers.
I think it is impossible to answer your question.... The code of
enrichGO
is 'open', so you could check the details of each step of the analysis, but this is AFAIK not the case for Enrichr.Having said this; your remark that the number of genes in the
enrichGO
output is larger than that of Enrich might be explained by the fact thatenrichGO
uses theGOALL
annotation (so all parent-child relationships of the hierarchy are taken into account), whereas according to their paper Enrichr seems to 'cut' the hierarchy at a specific level.From the section 'Implementation' of the Enrichr paper:
The ontology category contains gene-set libraries created from the three gene ontology trees [6] and from the knockout mouse phenotypes ontology developed by the Jackson Lab from their MGI-MP browser [38]. To create such gene-set libraries, we "cut" the tree at either the third or fourth level and created a gene set from the terms and their associated genes downstream of the cut. The details about creating the Gene Ontology gene-set libraries are provided in our previous publication, Lists2Networks [24]
.Also, when you use the current version of Bioconductor (
v3.19
), then the source of theGO.db
is dated January 2024.