I am working with a non model organism called Tenualosa ilisa. I am working with RNA-seq data. De novo assembly was performed using trinity. Foe kegg enrichment analysis, at first blastx was performed against kegg database. From blast result I found a output file where a column contain KO id. I also perform Differential expression of the transcripts. so I have two files one containing transcript id and KO id, and other is differentially expressed transcript result file. with these how can i go forward for kegg enrichment analysis? Can anyone please provide me the complete pipeline with coding and step by step process to do this?
Any advised would be much appreciated! Thanks in advance!
Some comments: It looks you have a similar challenge as was recently posted here: Result from GSEA for non-model organism not as expected. Please check that thread completely.
All gene set analyses, either the overrepresentation (ORA) or gene set enrichment analysis (GSEA) variant, assume you use as input a (ranked) gene list, not of transcripts.
It is also expected that all input IDs are unique. Again, see the thread referred to above.
Hope this helps!
As I can understand ClusterProfiler requires a ordb but as i workimg with non model organism there is not available any database for this. What should I do in this case?
No, then you misunderstood:
clusterProfiler
does NOT necessarily need anOrgDb
!It basically needs to 2 inputs: a
data.frame
for argumentTERM2GENE
, and a data.frame for the argumentTERM2NAME
. These are then used for the generic ORA functionenricher
, or the generic GSEA functionGSEA
. This is what is highlighted in the recent thread I referred to above, in which (also?) KO/ko ids are used.The functions
enrichKEGG
andgseKEGG
are rather convenience functions that allow to easy perform a KEGG-based ORA or GSEA analysis, respectively, for organisms for which anOrgDb
is available.You may also want to check the section on 'Universal enrichment analysis' here, or my post here: what the test method for enrichGO in clusterProfiler?.
Thanks a lot for you reply! can you please tell me about tje format of these two data.frame. I want know what are these two files. actually I am a beginner in this field so I am facing so many problem about this.
I have the following two files. File 1: from blastx result against kegg database swissprot_id blasx_hit Q86UKO K10093 P4I17O K06453 File 2 is the DEG results. what is the nest step I should do to perform kegg enrichment analysis?