Question

topGO - treating up- and down-regulated genes separately

0

Entering edit mode

predeus • 0

@predeus-9207

Last seen 4.7 years ago

United States

I've recently started using topGO in my expression analysis and like it a lot. The "elimination" paradigm is a very useful one when doing pathway analysis, since many different pathways do in fact overlap in the genes they contain.

In the manual, all the examples are using p-values as a statistic, but never separate the up- and down-regulated genes. They should be considered separately, should they not?

And, correspondingly, would it be necessary to modify the application for Kolmogorov-Smirnov statistic and KS-elim? In classical GSEA, your statistic has a plus or a minus sign, to indicate if the gene is up- or down-regulated. However, since you're only using the p-values, I was curious as to how can you separate them.

Additionally, it would be nice to understand how are the p-values used in Kolmogorov-Smoirnov (like) methods in topGO, and whether it is possible to use another statistic - such as t-statistic from limma or Wald statistic from DESeq2.

topGO GSEA pathway analysis • 2.7k views

ADD COMMENT • link updated 7.8 years ago by Lluís Revilla Sancho ▴ 760 • written 7.8 years ago by predeus • 0

score 0 · Answer 1 · 2017-03-10

0

Entering edit mode

Lluís Revilla Sancho ▴ 760

@lluis-revilla-sancho

Last seen 8 days ago

European Union

One can use any statistic you want. The statistic is used to select the genes with geneSelectionFun.(Be aware that geneSelectionFun doesn't work properly).

However I can't shed any light to how are the p-values calculated (You can have a p-value of a GO term without any significant gene on that term, how? I don't know).

ADD COMMENT • link 7.8 years ago Lluís Revilla Sancho ▴ 760

0

Entering edit mode

yeah, it's pretty clear how it works for Fisher's statistics and the like - you just select the genes you consider significant and calculate the p-value for 2x2 table

it's much more tricky for KS (or KS-like, as in GSEA) - statistic should add up some statistic (in some power) to the running sum, right? So there's no way to distinguish up- and down-regulated genes - you just lump them all together.

I think this might be very misleading to a lot of users and produce meaningless results.

The bug you are describing is also interesting. Did you try reporting it?

ADD REPLY • link 7.8 years ago predeus • 0

0

Entering edit mode

You can distinguish up and down regulated genes if you use the t statistic or the fold change (or log fold change). How is the statistic used in the KS approach? I am not sure.

Yes, I reported the bug several times

ADD REPLY • link 7.8 years ago Lluís Revilla Sancho ▴ 760