Question

How can we use the organism database created by makeorgPackageFromNCBI to KEGG and GO analysis using clusterprofiler package.

0

Entering edit mode

Aastha Kapoor • 0

@3e18707b

Last seen 6 months ago

India

I have transcriptome data of an inhouse sequenced bacterial genome. I made the database for my bacteria using makeOrgPackageFromNCBI command.

How can I use this database for KEGG AND Go analysis (ORA or GSEA). All commands use either KEGG organism database for such an analysis. I tried with that command (KEGGrich command) but it indicated that "Gene ids did not match" as output.

If I want to use the above method, where I use the closest strain to my species from the KEGG database. I had a set of gene ID (eg: R30_hybrid_002367) for which I have extracted KO, Go ids using eggnog tool. can I map these Ko ids as Term2gene, how can I extract term2name database for this?

If KEGG database of a closet species (with term2gene and term2name) for analysis and I have identified the KO ids of my filtered DEGs using eggnog, but for some gene such as sRNAs or hypothetical protein, I do not have the Ko ids. I will miss those gene ids from my analysis?

Please help me solve this issue.

clusterProfiler OrganismDb • 1.2k views

ADD COMMENT • link updated 7 months ago by Guido Hooiveld ★ 4.0k • written 7 months ago by Aastha Kapoor • 0

0

Entering edit mode

Please show the lines of code you tried, as well as the content of your input files. It seems you got all annotation information that is required. Also double-check your code with the help pages of each function, because (for example) the function KEGGrich doesn't exist. The format of the term2name and term2gene (being data.frames) are also mentioned. You may want to check this thread (for KEGG) and this thread (for GO) on how to obtain the term2gene and term2name objects.

You already found the before-mentioned thread on the use KO ids with clusterProfiler, so what doesn't work for you? Again, please show your code.

Yes, if entries are not mapping to a KO id they will indeed be filtered and not included in a subsequent analysis.

No disrespect intended, but based on the questions you posted I think you would profit most by asking guidance from someone more experienced at you local institute.

ADD REPLY • link 7 months ago Guido Hooiveld ★ 4.0k

0

Entering edit mode

Thankyou so much for answering. I was wondering how do I link my created database (using makeorgPackagefromNCBI) for KEGG and GO analysis. Do I need to write as "organism object" (org=) in the KEGG enrichment code or I need link it by term2gene and term2name method as explained in this thread

ADD REPLY • link 7 months ago Aastha Kapoor • 0

0

Entering edit mode

What did you try yourselves? Again, please show your code....!!

For KEGG-based analysis you can make use of the convenience functions enrichKEGG and gseKEGG.If you use these, you do NOT need an OrgDb, but just the KEGG organism code of your organism of interest (= hsy). See ?enrichKEGG and ?gseKEGG.

For GO-based analysis you can make use of the convenience functions enrichGO and gseGO. For these 2 functions you will indeed need an OrgDb. See ?enrichGO and ?gseGO.

The nice thing of these convenience functions is that these automagically take care of retrieving and formatting the required annotation information (= TERM2GENE and TERM2NAME) files; you will only need to provide as input list of (selected) genes (of course having the proper type of id).

Yet, in the end both convenience functions make use of the generic functions enricher and GSEA. You could also directly use these generic functions yourselves. If you do so, in both cases you will have to provide the the TERM2GENE and TERM2NAME files as well. This may be useful if, for example, you already have a table in which ids are mapped to a GO category, or any other gene set. This will allow for flexibility and also permit you to skip the creation of an OrgDb. See also: https://yulab-smu.top/biomedical-knowledge-mining-book/universal-api.html

ADD REPLY • link 7 months ago Guido Hooiveld ★ 4.0k