Question

clusterProfiler - GeneRatio & nb of genes always positively correlated in dotplot

1

Entering edit mode

Jane Merlevede ▴ 90

@jane-merlevede-5019

Last seen 7.3 years ago

Hello,

Sorry for asking a lot of questions about clusterProfiler. I try to fully understand the results and graphs I want to use.

When using dotplot() on the result of enrichGO() and enrichDO(), I always observe that the most significant terms are the one with the biggest number of genes and the highest gene ratio. Whereas, we can imagine to get a very significant term with very high gene ratio but small (at least not the biggest) number of genes...

I observed that also on all the examples I saw on the vignette and on the web.

What is the rational behind this? This way, the small gene sets can never appear significant...

Jane

clusterProfiler • 5.5k views

ADD COMMENT • link 9.2 years ago Jane Merlevede ▴ 90

score 0 · Answer 1 · 2016-11-10

0

Entering edit mode

Guangchuang Yu ★ 1.2k

@guangchuang-yu-5419

Last seen 4 months ago

China/Guangzhou/Southern Medical Univer…

pvalue is not correlated with gene ratio, see the color of the plot here.

ADD COMMENT • link 9.2 years ago Guangchuang Yu ★ 1.2k

score 0 · Answer 2 · 2016-11-11

0

Entering edit mode

Jane Merlevede ▴ 90

@jane-merlevede-5019

Last seen 7.3 years ago

Indeed, pvalue is not correlated with gene ratio, in my examples as well. But I did not mention this relashionship. I speak about the positive correlation between the number of genes and the gene ratio. Can you answer this please?

Your example illustrates perfectly my point: one more case with no high gene ratio and small number of genes, as we can expect.

ADD COMMENT • link 9.2 years ago Jane Merlevede ▴ 90

0

Entering edit mode

gene ratio is k/n and gene count is k, they are indeed positive related. https://bioconductor.org/packages/devel/bioc/vignettes/DOSE/inst/doc/enrichmentAnalysis.html#over-representation-analysis

ADD REPLY • link 9.2 years ago Guangchuang Yu ★ 1.2k

score 0 · Answer 3 · 2016-11-14

0

Entering edit mode

Jane Merlevede ▴ 90

@jane-merlevede-5019

Last seen 7.3 years ago

Thank you for your answer. I thought n was the number of genes in a category and not the size of the input list of genes of interest. That is why I did not expect correlation between the number of genes and the gene ratio.

It would be great to have the possibility to plot this information as well (ratio of genes of interest over number of genes in a category), that is adding a third possible value to "x".

ADD COMMENT • link 9.2 years ago Jane Merlevede ▴ 90

0

Entering edit mode

you mean k/M?

ADD REPLY • link 9.2 years ago Guangchuang Yu ★ 1.2k

0

Entering edit mode

yes, exactly

ADD REPLY • link 9.2 years ago Jane Merlevede ▴ 90

2

Entering edit mode

I will not change gene ratio (k/n, data from input gene list) and background ratio (M/N, data from background annotation), as they exist quite a long time in the package.

Maybe I can introduce another column, namely odd ratio by (gene ratio)/(background ratio), in future release.

ADD REPLY • link 9.2 years ago Guangchuang Yu ★ 1.2k