Question: How to select the genes mapped to an enriched KEGG pathway (kegga)
0
2.8 years ago by
mat14940
mat14940 wrote:

Hello,

I used the kegga function to gather pathway enrichment for my dataset.  I would like to obtain a data frame (or any possible format) containing which genes are "up" and "down" for a given kegg pathway. So for example, I want to select all genes from the "N" column, "up" column" and "down" column for the row "path:dre04020".

I would also like to do this for a gene ontology enrichment table returned from goana.  For example, I would like to select/extract all genes from the N/up/down columns for row "GO:0022613".

I tried using the select function but haven't had much luck. Does anyone know if this can be done?

Thanks,

Matt

 N Up Down P.Up P.Down path:dre01100 Metabolic pathways 1074 291 27 4.29E-85 0.000174 path:dre04020 Calcium signaling pathway 216 70 4 3.03E-24 0.266749 path:dre01200 Carbon metabolism 114 47 2 1.27E-21 0.403579 path:dre04080 Neuroactive ligand-receptor interaction 313 77 13 1.99E-18 0.000105 path:dre00010 Glycolysis / Gluconeogenesis 70 32 1 1.02E-16 0.575509

 Ont N Up Down P.Up P.Down GO:0022613 ribonucleoprotein complex biogenesis BP 205 0 87 1 1.65E-60 GO:0034660 ncRNA metabolic process BP 215 0 87 1 2.22E-58 GO:0005730 nucleolus CC 132 0 68 1 1.93E-54 GO:0042254 ribosome biogenesis BP 143 0 69 1 1.04E-52 GO:0034470 ncRNA processing BP 163 0 72 1 1.45E-51
kegga • 1.7k views
modified 2.8 years ago by Gordon Smyth38k • written 2.8 years ago by mat14940
Answer: How to select the genes mapped to an enriched KEGG pathway (kegga)
0
2.8 years ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

This is the sort of thing that we don't provide special functions for in limma, but which you can do reasonably easily for yourself if you are confident in using R.

Suppose you have done a limma linear model analysis resulting in a fit object. You might look at the topTable:

tab <- topTable(fit, n=Inf)

Now you want to a KEGG analysis. You could download the KEGG pathway annotation for zebra fish by:

GK <- getGeneKEGGLinks(species.KEGG = "dre")

Then you could do the KEGG analysis:

k <- kegga(fit, species.KEGG="dre", gene.pathway=GK)
topKEGG(k)

Now if you want to see the top-table genes that belong to a particular pathway (say path:dre01100), it is just a matter of using standard subseting operations:

i <- tab$GeneID %in% GK$GeneID[GK$PathwayID=="path:dre01100"] tab[i,]  Edit: The above code assumes your gene IDs are stored in the "GeneID" column. If you have stored them under a different column name, then you need to make the obvious change, such as: i <- tab$ENTREZID %in% GK$GeneID[GK$PathwayID=="path:dre01100"]
tab[i,]

You might also need to use as.character(tab$ENTREZID) if you have stored the column as a number or as a factor. ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Gordon Smyth38k Hey Gordon, I tried your script but I had to modify a few things to run the KEGG pathway enrichment: entzzvec<-as.vector(data.fit.eb$genes$ENTREZID) tab <- topTable(data.fit.eb,lfc=1.5,adjust="BH",p.value=0.01, n=Inf) GK <- getGeneKEGGLinks(species.KEGG = "dre") k <- kegga(data.fit.eb,coef=1,geneid=entzzvec,species.KEGG="dre", gene.pathway=GK,FDR=0.01) topKEGG(k) i <- tab$GeneID %in% GK$GeneID[GK$PathwayID=="path:dre01100"]
tab[i,]

When I run the last line, it returns:

> tab[i,]
[1] PROBEID            ID                 SYMBOL             GENENAME
[5] ENTREZID           morphant...control rescue...control   morphant...rescue
<0 rows> (or 0-length row.names)

I am not sure what to do from here, am I getting close?

You are using tab$GeneID when you don't actually have such a column. So naturally i contains nothing and tab[i,] has zero rows. See my edit above. ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Gordon Smyth38k Haha thanks for pointing that out (as.character(tab$ENTREZID)), I missed it completely :X

this works, thank you!

Hey,

I have been struggling to emulate this with GO categories.  I am stuck b/c I do not know how to generate an "R" object containing all GO term annotations for "Dr" analogous to: GK <- getGeneKEGGLinks(species.KEGG = "dre"). Beyond that, would i <- tab$GeneID %in% GK$GeneID[GK$PathwayID=="path:dre01100"] then need to be changed to: i <- tab$GeneID %in% GK$GeneID[GK$TermID=="GO:0004984"] for gene ontologies?

Here is an example of five categories I am interested in.

 Term Ont N Up Down P.Up P.Down GO:0004984 olfactory receptor activity MF 102 48 0 1.15E-30 1 GO:0050877 neurological system process BP 258 63 47 3.42E-21 0.010118 GO:0007601 visual perception BP 61 2 32 0.905847 1.7E-13 GO:0008066 glutamate receptor activity MF 22 0 18 1 4.65E-13 GO:0045202 synapse CC 167 2 68 0.999791 2.59E-19

entzzvec<-as.vector(data.fit.eb$genes$ENTREZID)
??? GG<-GET GO TERM ANNOTATIONS ???
MOgo <- goana(data.fit.eb, coef=1,FDR=0.01,geneid=data.fit.eb$genes$ENTREZID,species="Dr")
topMOgo<-topGO(MOgo,number=50)
i <- tab1$GeneID %in% GG$GeneID[GG\$TermID=="GO:0004984"]
tab1[i,]

Thanks,

Matt