Question

Got an confusing about BgRatio in enrichKEGG result of clusterprofiler

0

Entering edit mode

nature.hunger ▴ 30

@naturehunger-10178

Last seen 7.3 years ago

I am trying to runing the example of clusterprofiler from "http://www.bioconductor.org/packages/devel/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html#abstract" and "https://guangchuangyu.github.io/2016/05/convert-biological-id-with-kegg-api-using-clusterprofiler/". I got an question in BgRatio column. Why these two example have a big different in background genes? I check the newest database of human KEGG pathway. It only has the 7192 genes. Am I make a mistake?

in bioconductor's example, the background genes have 7164. 

kk <- enrichKEGG(gene         = gene,
                 organism     = 'hsa',
                 pvalueCutoff = 0.05)
head(kk)

...

##           BgRatio       pvalue     p.adjust       qvalue
## hsa04110 124/7164 1.706341e-07 2.951969e-05 0.0000290976
## hsa04114 124/7164 1.569415e-06 1.357544e-04 0.0001338133
## hsa03320  72/7164 1.884398e-05 1.086670e-03 0.0010711317
## hsa04914  98/7164 9.664771e-04 4.180013e-02 0.0412024432
## hsa04115  69/7164 1.226862e-03 4.244943e-02 0.0418424543
## hsa04062 187/7164 1.485638e-03 4.283588e-02 0.0422233843

...

in github's example, the background genes have 9275.

x <- enrichKEGG(np2up[,2], organism='hsa', keyType='uniprot')

...

## BgRatio pvalue p.adjust qvalue
## hsa04072 216/9275 0.0002654190 0.03901659 0.03240905
## hsa04060 354/9275 0.0005349245 0.03931695 0.03265855
## hsa04390 213/9275 0.0009536247 0.04199404 0.03488227
## hsa04975 58/9275 0.0014014886 0.04199404 0.03488227
## hsa05221 86/9275 0.0014283687 0.04199404 0.03488227

...

clusterprofiler • 2.7k views

ADD COMMENT • link updated 7.3 years ago by Guangchuang Yu ★ 1.2k • written 7.3 years ago by nature.hunger ▴ 30

score 0 · Answer 1 · 2016-12-20

0

Entering edit mode

Guangchuang Yu ★ 1.2k

@guangchuang-yu-5419

Last seen 27 days ago

China/Guangzhou/Southern Medical Univer…

the number change since your input keyType was change.

The acutal gene annotation is 7164 (may change when KEGG updated), this annotation is based on ENTREZGENEID. When we mappped the gene ID to uniprot, the number of protein annotated increase since multiple mapping exists (ID mapping is not alwasy 1-to-1).

ADD COMMENT • link 7.3 years ago Guangchuang Yu ★ 1.2k

0

Entering edit mode

Thanks a lot ! I am trying to imply it into differentially expression genes list from my projects. And I had try these two methods. The number of genes in the same enriched KEGG pathway are different and consequently results to the KEGG pathway rank differently. So that I can only put my gene list into bioconductor's example? I can't convert my gene list into uniprot ID to analysis?

ADD REPLY • link 7.3 years ago nature.hunger ▴ 30

0

Entering edit mode

it depends whether your input list is at gene level or protein level.

ADD REPLY • link 7.3 years ago Guangchuang Yu ★ 1.2k

0

Entering edit mode

Thanks. I have a another question. why setReadable function can't support in enrichKEGG function output with entrenz gene ID but can be used in uniprot ID. I had google that the previously version had the readable parameter, but it is useless now.

ADD REPLY • link 7.3 years ago nature.hunger ▴ 30

0

Entering edit mode

`setReadable` function is always exists and work with enrichKEGG output.

I guess you mean the `readable` parameter.

Since now enrichKEGG work with online data and support more than 4000 species. For most of the speices, there are no data to support ID conversion. So `readable` parameter was removed since enrichKEGG supports using online data.

For those species that have OrgDb object/package available, you can still convert ID using `setReadable` function.

If some ID types can work for you and some cannot. Follow the guide, https://guangchuangyu.github.io/2016/07/how-to-bug-author/, and post a reproducible example.

ADD REPLY • link 7.3 years ago Guangchuang Yu ★ 1.2k

0

Entering edit mode

Thank you very much! It is the reason that I am not using the newest version of clusterProfilier. It might be the reason of my Bioconducter(V3.3) is not the newest version. So when I following the installation instructions as:

source("https://bioconductor.org/biocLite.R")
biocLite("clusterProfiler")

It download the clusterProfiler 3.0.5 automatically. But the up to date version is 3.2.8.

ADD REPLY • link 7.3 years ago nature.hunger ▴ 30

0

Entering edit mode

The release version of Bioconductor is 3.4.

You should use the latest clusterProfiler.

see `setReadable` session in https://guangchuangyu.github.io/2016/05/convert-biological-id-with-kegg-api-using-clusterprofiler/

If your input gene list is entrez gene IDs.

You can use something like:

y <- setReadable(x, 'org.Hs.eg.db', keytype="ENTREZID")

ADD REPLY • link 7.3 years ago Guangchuang Yu ★ 1.2k

0

Entering edit mode

Thanks! It works fine now!

ADD REPLY • link 7.3 years ago nature.hunger ▴ 30