Question

KEGG enrichment analysis for non model organism

0

Entering edit mode

afsanarupa1 • 0

@2f24e23b

Last seen 16 months ago

Bangladesh

I am working with a non model organism called Tenualosa ilisa. I am working with RNA-seq data. De novo assembly was performed using trinity. Foe kegg enrichment analysis, at first blastx was performed against kegg database. From blast result I found a output file where a column contain KO id. I also perform Differential expression of the transcripts. so I have two files one containing transcript id and KO id, and other is differentially expressed transcript result file. with these how can i go forward for kegg enrichment analysis? Can anyone please provide me the complete pipeline with coding and step by step process to do this?

Any advised would be much appreciated! Thanks in advance!

RNAseq123 • 3.1k views

ADD COMMENT • link 16 months ago afsanarupa1 • 0

0

Entering edit mode

Some comments: It looks you have a similar challenge as was recently posted here: Result from GSEA for non-model organism not as expected. Please check that thread completely.

All gene set analyses, either the overrepresentation (ORA) or gene set enrichment analysis (GSEA) variant, assume you use as input a (ranked) gene list, not of transcripts.

It is also expected that all input IDs are unique. Again, see the thread referred to above.

Hope this helps!

ADD REPLY • link 16 months ago Guido Hooiveld ★ 3.9k

0

Entering edit mode

As I can understand ClusterProfiler requires a ordb but as i workimg with non model organism there is not available any database for this. What should I do in this case?

ADD REPLY • link 16 months ago afsanarupa1 • 0

0

Entering edit mode

No, then you misunderstood: clusterProfiler does NOT necessarily need an OrgDb!

It basically needs to 2 inputs: a data.frame for argument TERM2GENE, and a data.frame for the argument TERM2NAME. These are then used for the generic ORA function enricher, or the generic GSEA function GSEA. This is what is highlighted in the recent thread I referred to above, in which (also?) KO/ko ids are used.

The functions enrichKEGG and gseKEGG are rather convenience functions that allow to easy perform a KEGG-based ORA or GSEA analysis, respectively, for organisms for which an OrgDb is available.

You may also want to check the section on 'Universal enrichment analysis' here, or my post here: what the test method for enrichGO in clusterProfiler?.

ADD REPLY • link 16 months ago Guido Hooiveld ★ 3.9k

0

Entering edit mode

Thanks a lot for you reply! can you please tell me about tje format of these two data.frame. I want know what are these two files. actually I am a beginner in this field so I am facing so many problem about this.

ADD REPLY • link 16 months ago afsanarupa1 • 0

0

Entering edit mode

I have the following two files. File 1: from blastx result against kegg database swissprot_id blasx_hit Q86UKO K10093 P4I17O K06453 File 2 is the DEG results. what is the nest step I should do to perform kegg enrichment analysis?

ADD REPLY • link 16 months ago afsanarupa1 • 0

score 1 · Answer 1 · 2022-12-08

If you look at the KEGG species list, you won't find your species listed there. Which will make it difficult to do anything. You need some sort of mappings from the IDs you have in hand to the KEGG or GO identifiers. If you can come up with that sort of mapping, you can always use the kegga function in limma, which allows you to provide a Gene/ID mapping data.frame. Unfortunately, working with non-model organisms can be difficult.