Pathway analysis using ReactomePA::enrichPathway(): doubts about minGSSize and maxGSSize arguments
1
1
Entering edit mode
@miguelcosenza-14267
Last seen 4.0 years ago
Germany / Freiburg / University Medical…

Hello everyone,

I am doing a pathway analysis over a differentially expressed set of spleen proteins after a parasitic infection in mice. I am new to the world of pathway analysis and I am finding difficulties to understand some differences that I am getting in my enriched categories along with variations in my inputs of the minGSSize and maxGSSize arguments of the function enrichPathway() and other enrichment functions in clusterprofiler package and others. I have gone through the vignettes and other documentation and I am not finding this information, so I am thinking about it as a general doubt about pathway analysis. 

As stated in the documentation, minGSSize is "the minimal size of genes annotated by Ontology term for testing" (default = 10) while maxGSSize the "maximal size of each geneSet for analyzing" (default=500). I have used the following values for these arguments for different enrichment analysis (The size of my gene set is = 364):

Enrich_Hr1.2_1 <- enrichPathway(FC_Hgr1.2$`Entrez ID`, organism = "mouse",
                              readable = F, pvalueCutoff = 0.05, 
                              pAdjustMethod = "fdr",
                              minGSSize = 10,
                              maxGSSize = 500
)

Enrich_Hr1.2_2 <- enrichPathway(FC_Hgr1.2$`Entrez ID`, organism = "mouse",
                                readable = F, pvalueCutoff = 0.05, 
                                pAdjustMethod = "fdr",
                                minGSSize = 1,
                                maxGSSize = 500
)

Enrich_Hr1.2_3 <- enrichPathway(FC_Hgr1.2$`Entrez ID`, organism = "mouse",

                                readable = F, pvalueCutoff = 0.05, 
                                pAdjustMethod = "fdr",
                                minGSSize = 1,
                                maxGSSize = 1000
)

Enrich_Hr1.2_4 <- enrichPathway(FC_Hgr1.2$`Entrez ID`, organism = "mouse",
                                readable = F, pvalueCutoff = 0.05, 
                                pAdjustMethod = "fdr",
                                minGSSize = 10,
                                maxGSSize = 1000
)

enrichMap(Enrich_Hr1.2_X,  fixed = FALSE, vertex.label.cex = 1, 
          n = 30, vertex.label.font = 1)   ### Enrichmap for each condition

When the Enrichmap is plotted, it could be perceived that some pathways are enriched in some conditions while they are not shown in others. For example: Enrich_Hr1.2_3 and Enrich_Hr1.2_4 have an important amount of proteins in the "Innate Immune System Pathway" while Enrich_Hr1.2_1 and Enrich_Hr1.2_2 are not showing this reactome category while showing others which don't appear in the other two. 

I wish to know how these arguments affect the final outcome of the Reactome Pathway analysis and if there are some general recommendations on how to select those values (I don't want to select them just because some output is more similar to what I am expecting to observe).

I hope this is not an extremely trivial question. I accept recommendations about readings related to pathway analysis. Enrichmaps for each evaluated condition could be found in the link. https://drive.google.com/open?id=1n4UIsJv7Mvx0aufz0Hq-2fuJVVperk5y

Thanks in advance!

 

 

 

 

 

 

reactomepa pathway analysis enrichment analysis clusterprofiler • 6.0k views
ADD COMMENT
2
Entering edit mode
Guangchuang Yu ★ 1.2k
@guangchuang-yu-5419
Last seen 8 weeks ago
China/Guangzhou/Southern Medical Univer…

For gene sets with fewer than 10 genes, just 2 or 3 genes can generate significant results. For gene sets larger than 500 genes, they are too general terms and also have a high chance to get significant p values. So these gene sets were eliminated from analysis.

The enrichMap only show most significant n=50 (by default) enriched terms. And the results obtained using different parameters are expected to have slightly different results.

The difference is actually smaller than it seems to be. For comparison, I recommend you use dotplot provided by clusterProfiler. Your comparison is similar to dotplot(..., includeAll=FALSE), and I highly recommend you use dotplot(..., includeAll=TRUE) to compare the results from different runs. See https://guangchuangyu.github.io/2016/11/showcategory-parameter-for-visualizing-comparecluster-output/ for details.

ADD COMMENT
0
Entering edit mode

Many thanks for your reply! I will definitively make use of your recommendation to make comparisons between runs of enrichment. 

ADD REPLY

Login before adding your answer.

Traffic: 588 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6