gene set enrichment analysis with enrichKEGG (clusterProfiler) returns very few enriched pathways
0
0
Entering edit mode
Emilia ▴ 30
@emiliabaffo
Last seen 4 months ago

Hello! I have a list of about 6400 differentially expressed genes and I want to perform KEGG pathways enrichment analysis using the clusterProfiler package.

I first did it with the ORA method, using the enrichKEGG function and doing it separately on the genes with positive and negative logFC. I got hundreds of enriched pathways with this method, both for the up and downregulated genes.

I then tried to do it with the GSEA method using the gseKEGG function (in this case of course I didn't split the + and - logFC values since the method already assigns an enrichment score with a sign). However, I only got 13 pathways as a result, half upregulated and half downregulated. I find it odd considering the amount of enriched pathways I had obtained with the other method. I am aware that the GSEA method usually returns less results that the ORA one but I had also analyzed my gene list using the GSEA desktop software and I had gotten about 20 upregulated and 20 downregulated pathways.

Why could it be that I'm getting so few enriched pathways using the gseKEGG function? should I just set a less astringent p value or is there something else I could try? I'm just using the default parameters but I'll leave my code here in case you need to see it.


pathways_GSEA <- gseKEGG(geneList     = geneList,
organism     = 'hsa',
minGSSize = 15,
maxGSSize = 500,
nPermSimple = 10000,
pvalueCutoff = 0.05)
sessionInfo( )
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Argentina.1252  LC_CTYPE=Spanish_Argentina.1252
[3] LC_MONETARY=Spanish_Argentina.1252 LC_NUMERIC=C
[5] LC_TIME=Spanish_Argentina.1252

clusterProfiler KEGG GSEA • 390 views
0
Entering edit mode

Hard to tell, but one possible reason is that there is a mismatch between your genes and the where gseKEGG look for pathways. Other possible reasons is that one of those databases is not updated and report less results (or is updated and therefore better annotated and there were some corrections on the pathways...). Also depending on how did you do your analysis with the GSEA desktop it might include other databases outside KEGG so it would be expected to receive less results with gseKEGG

0
Entering edit mode

GSEA does include other databases but I picked the KEGG pathways specifically so I'm pretty sure it's not that. For the rest, I guess it could be a mismatch or that the database was updated but in that case, I think I should be having the same problem with the enrichKEGG function since that one also uses my list of genes and the same database.