EGSEA: how does the program generate the value for the up or down regulation of gene sets or pathways
1
0
Entering edit mode
yingchen • 0
@yingchen-11543
Last seen 4.8 years ago

Hi guys,

I tried the EGSEA package and was hoping to find a better way to do the GSEA. The output column Direction under gsa$mylabel$test.results$mycontrast gives values 1,0, -1. I assume that -1 means the gene set/pathway is down-regulated and 1 means it's up-regulated. My problem is that most of the gene sets/pathways in my test study are -1, and thus down-regulated, which is not consistent with the results I got with other programs such as GAGE (which is one test in EGSEA). I read the pre-print and found no clue. Any suggestion? Thanks a lot, Ying pathways gsea egsea gage • 933 views ADD COMMENT 0 Entering edit mode @monther-alhamdoosh-10001 Last seen 2.1 years ago Australia/Melbourne/CSL Limited Hi Ying, Thanks for trying out our package! The Direction column in EGSEA is calculated based on the logFC values that are calculated using limma::topTable (if it was not provided). We simply count the number of genes that are up- and down-regulated in the gene set and make a decision based on the direction of the majority. Note that the argument "logFC.cutoff" is used in this calculation, which is 0 by default. To closely see the logFC values that are used in the calculation, click on the "Interpret Results" link in the EGSEA report and look into the CSV files of the gene sets of interest. Hope this helps. Best, Monther ADD COMMENT 0 Entering edit mode Hi Monther, Thanks a lot for the explanation! I have another question, what is the recommended cut-off for significant DE gene sets and pathways? When I used GAGE for the same data set, in 1 contrast, there are only 13 kegg pathways with q.val <= 0.01. When I did the analysis with EGSEA, the q.val are much more smaller ~ 10-E7 and DE pathways are way more than 13. Another thing, is it possible to integrate GOexpress into you ensemble? Thanks a lot, Ying ADD REPLY 0 Entering edit mode Hi Ying, No worries! That's right. EGSEA produces p-values that are much smaller than individual methods, particularly for gene sets that are significant in the majority of base methods. Another factor that affects the scale of EGSEA p-values is the individuals methods that are used as some of them produce very small p-values. You can try to set print.base=TRUE and then look into gsa@results$gslabel$base.results$contrast\$base_method to see which method produces very small p-values. You can try different p-value combining methods by setting the argument "combineMethod" (see egsea.combine()). However, we recommend to look into the top N (N=10-30) gene sets ranked using different EGSEA scores rather than using a p-value cut-off and then use the p-values to support your findings.

Thanks for mentioning GOExpress. We will look into the possibility of integrating it with our package.

Best,

Monther

p.s. I recommend you to use the developmental version of EGSEA as it has been significantly improved since our first release.

0
Entering edit mode

Hi Monther, It looks like any data files generated on the level of individual genes (e.g. the heatmap.csv files / "Interpret Results" downloads) are missing the actual Entrez gene ID column in the output. Could you please check on that? I am using EGSEA.1.10.1 on R.3.5.1. Thank you!