Question

Function kegga not working for species other than the conventional ones - Pathways do not overlap with universe

0

Entering edit mode

Raito92 ▴ 60

@raito92-20399

Last seen 3.6 years ago

Italy

Hi! I'm running the function 'kegga' to perform a pathyway analysis on my olive tree data (a species for which a default package doesn't exist), but I get this error...

enter image description here

I think the problem may be my Gene names, which aren't normal NCBI Gene IDs...

https://i.ibb.co/yBWCC2q/Gene-Names.png

Any suggestions? Thanks in advance!

If needed, here is my sessionInfo() -> https://i.ibb.co/tLkSqmH/session-Info.png

kegga limma pathway analysis kegg • 2.5k views

ADD COMMENT • link updated 6.8 years ago by Gordon Smyth 53k • written 6.8 years ago by Raito92 ▴ 60

score 3 · Accepted Answer · 2019-05-02

3

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 3 days ago

United States

You need to map those to NCBI Gene IDs, which is what KeGG uses. A quick Google search of a few of your gene names didn't bring anything up, so presumably you know what those IDs are, and can figure out how to map them. If not, you need to find someone who does, which would probably be somewhere other than here.

ADD COMMENT • link 6.8 years ago James W. MacDonald 68k

0

Entering edit mode

Hi, thanks for your answer! Yes, I realised later that the answer could be using the NCBI codes of the genome mine was annotated on (wild olive tree, while mine is the domestic olive tree) ... so it looks as if all I need is to merge some data to obtain the NCBI codes before, and then pass such new codes to kegga so it can process them as they were from wild olive!

ADD REPLY • link 6.8 years ago Raito92 ▴ 60

score 2 · Accepted Answer · 2019-05-05

You don't explain which genome build or what annotation you have used, so I am going to make some guesses.

I am guessing that you are using the Oe6 de novo olive genome build http://denovo.cnag.cat/olive published by the Spanish CNAG team just a couple of years ago.

I am also guessing that you have used the Oe6 GFF3 gene annotation file from the same site, which provides gene Ids using their own system.

Since CNAG's genome build is so recent, and since they use their own gene Ids unique to the Oe6 project, there is no way that you can expect Bioconductor or KEGG to know what the gene Ids mean. You need to obtain gene annotation information from CNAG's own website. If they don't cross reference their own Gene Ids to NCBI or Ensembl or GO then no one else will be able to do so.

In my opinion, you should make sure that you know what the Gene Ids themselves mean before you try to do any functional analysis such as GO or KEGG. CNAG provides a directory of functional annotation files, and these files are presumably what you need to use. You have to explore the files for yourself and contact the CNAG people if you need help.