Search
Question: Pathway analysis using ReactomePA::enrichPathway(): doubts about minGSSize and maxGSSize arguments
1
gravatar for Miguel.Cosenza
13 days ago by
Brazil/Minas Gerais/Universidade Federal de Ouro Preto
Miguel.Cosenza10 wrote:

Hello everyone,

I am doing a pathway analysis over a differentially expressed set of spleen proteins after a parasitic infection in mice. I am new to the world of pathway analysis and I am finding difficulties to understand some differences that I am getting in my enriched categories along with variations in my inputs of the minGSSize and maxGSSize arguments of the function enrichPathway() and other enrichment functions in clusterprofiler package and others. I have gone through the vignettes and other documentation and I am not finding this information, so I am thinking about it as a general doubt about pathway analysis. 

As stated in the documentation, minGSSize is "the minimal size of genes annotated by Ontology term for testing" (default = 10) while maxGSSize the "maximal size of each geneSet for analyzing" (default=500). I have used the following values for these arguments for different enrichment analysis (The size of my gene set is = 364):

Enrich_Hr1.2_1 <- enrichPathway(FC_Hgr1.2$`Entrez ID`, organism = "mouse",
                              readable = F, pvalueCutoff = 0.05, 
                              pAdjustMethod = "fdr",
                              minGSSize = 10,
                              maxGSSize = 500
)

Enrich_Hr1.2_2 <- enrichPathway(FC_Hgr1.2$`Entrez ID`, organism = "mouse",
                                readable = F, pvalueCutoff = 0.05, 
                                pAdjustMethod = "fdr",
                                minGSSize = 1,
                                maxGSSize = 500
)

Enrich_Hr1.2_3 <- enrichPathway(FC_Hgr1.2$`Entrez ID`, organism = "mouse",

                                readable = F, pvalueCutoff = 0.05, 
                                pAdjustMethod = "fdr",
                                minGSSize = 1,
                                maxGSSize = 1000
)

Enrich_Hr1.2_4 <- enrichPathway(FC_Hgr1.2$`Entrez ID`, organism = "mouse",
                                readable = F, pvalueCutoff = 0.05, 
                                pAdjustMethod = "fdr",
                                minGSSize = 10,
                                maxGSSize = 1000
)

enrichMap(Enrich_Hr1.2_X,  fixed = FALSE, vertex.label.cex = 1, 
          n = 30, vertex.label.font = 1)   ### Enrichmap for each condition

When the Enrichmap is plotted, it could be perceived that some pathways are enriched in some conditions while they are not shown in others. For example: Enrich_Hr1.2_3 and Enrich_Hr1.2_4 have an important amount of proteins in the "Innate Immune System Pathway" while Enrich_Hr1.2_1 and Enrich_Hr1.2_2 are not showing this reactome category while showing others which don't appear in the other two. 

I wish to know how these arguments affect the final outcome of the Reactome Pathway analysis and if there are some general recommendations on how to select those values (I don't want to select them just because some output is more similar to what I am expecting to observe).

I hope this is not an extremely trivial question. I accept recommendations about readings related to pathway analysis. Enrichmaps for each evaluated condition could be found in the link. https://drive.google.com/open?id=1n4UIsJv7Mvx0aufz0Hq-2fuJVVperk5y

Thanks in advance!

 

 

 

 

 

 

ADD COMMENTlink modified 10 days ago by Guangchuang Yu800 • written 13 days ago by Miguel.Cosenza10
2
gravatar for Guangchuang Yu
10 days ago by
Hong Kong
Guangchuang Yu800 wrote:

For gene sets with fewer than 10 genes, just 2 or 3 genes can generate significant results. For gene sets larger than 500 genes, they are too general terms and also have a high chance to get significant p values. So these gene sets were eliminated from analysis.

The enrichMap only show most significant n=50 (by default) enriched terms. And the results obtained using different parameters are expected to have slightly different results.

The difference is actually smaller than it seems to be. For comparison, I recommend you use dotplot provided by clusterProfiler. Your comparison is similar to dotplot(..., includeAll=FALSE), and I highly recommend you use dotplot(..., includeAll=TRUE) to compare the results from different runs. See https://guangchuangyu.github.io/2016/11/showcategory-parameter-for-visualizing-comparecluster-output/ for details.

ADD COMMENTlink written 10 days ago by Guangchuang Yu800

Many thanks for your reply! I will definitively make use of your recommendation to make comparisons between runs of enrichment. 

ADD REPLYlink written 10 days ago by Miguel.Cosenza10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 119 users visited in the last hour