Question: How to select the genes mapped to an enriched KEGG pathway (kegga)
0
gravatar for mat149
3.1 years ago by
mat14940
mat14940 wrote:

Hello,

 

I used the kegga function to gather pathway enrichment for my dataset.  I would like to obtain a data frame (or any possible format) containing which genes are "up" and "down" for a given kegg pathway. So for example, I want to select all genes from the "N" column, "up" column" and "down" column for the row "path:dre04020".

I would also like to do this for a gene ontology enrichment table returned from goana.  For example, I would like to select/extract all genes from the N/up/down columns for row "GO:0022613".

I tried using the select function but haven't had much luck. Does anyone know if this can be done?

 

Thanks,

Matt

 

    N Up Down P.Up P.Down
path:dre01100 Metabolic pathways 1074 291 27 4.29E-85 0.000174
path:dre04020 Calcium signaling pathway 216 70 4 3.03E-24 0.266749
path:dre01200 Carbon metabolism 114 47 2 1.27E-21 0.403579
path:dre04080 Neuroactive ligand-receptor interaction 313 77 13 1.99E-18 0.000105
path:dre00010 Glycolysis / Gluconeogenesis 70 32 1 1.02E-16 0.575509

 

   

Ont

N

Up

Down

P.Up

P.Down

GO:0022613

ribonucleoprotein complex biogenesis

BP

205

0

87

1

1.65E-60

GO:0034660

ncRNA metabolic process

BP

215

0

87

1

2.22E-58

GO:0005730

nucleolus

CC

132

0

68

1

1.93E-54

GO:0042254

ribosome biogenesis

BP

143

0

69

1

1.04E-52

GO:0034470

ncRNA processing

BP

163

0

72

1

1.45E-51

kegga • 1.9k views
ADD COMMENTlink modified 3.1 years ago by Gordon Smyth39k • written 3.1 years ago by mat14940
Answer: How to select the genes mapped to an enriched KEGG pathway (kegga)
0
gravatar for Gordon Smyth
3.1 years ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

This is the sort of thing that we don't provide special functions for in limma, but which you can do reasonably easily for yourself if you are confident in using R.

Suppose you have done a limma linear model analysis resulting in a fit object. You might look at the topTable:

tab <- topTable(fit, n=Inf)

Now you want to a KEGG analysis. You could download the KEGG pathway annotation for zebra fish by:

GK <- getGeneKEGGLinks(species.KEGG = "dre")

Then you could do the KEGG analysis:

k <- kegga(fit, species.KEGG="dre", gene.pathway=GK)
topKEGG(k)

Now if you want to see the top-table genes that belong to a particular pathway (say path:dre01100), it is just a matter of using standard subseting operations:

i <- tab$GeneID %in% GK$GeneID[GK$PathwayID=="path:dre01100"]
tab[i,]

Edit:

The above code assumes your gene IDs are stored in the "GeneID" column. If you have stored them under a different column name, then you need to make the obvious change, such as:

i <- tab$ENTREZID %in% GK$GeneID[GK$PathwayID=="path:dre01100"]
tab[i,]

You might also need to use as.character(tab$ENTREZID) if you have stored the column as a number or as a factor.

ADD COMMENTlink modified 3.0 years ago • written 3.1 years ago by Gordon Smyth39k

Hey Gordon, 

I tried your script but I had to modify a few things to run the KEGG pathway enrichment:

entzzvec<-as.vector(data.fit.eb$genes$ENTREZID)
tab <- topTable(data.fit.eb,lfc=1.5,adjust="BH",p.value=0.01, n=Inf)
GK <- getGeneKEGGLinks(species.KEGG = "dre")
k <- kegga(data.fit.eb,coef=1,geneid=entzzvec,species.KEGG="dre", gene.pathway=GK,FDR=0.01)
topKEGG(k)
i <- tab$GeneID %in% GK$GeneID[GK$PathwayID=="path:dre01100"]
tab[i,]

 

When I run the last line, it returns:

> tab[i,]
 [1] PROBEID            ID                 SYMBOL             GENENAME          
 [5] ENTREZID           morphant...control rescue...control   morphant...rescue 
 [9] AveExpr            F                  P.Value            adj.P.Val         
<0 rows> (or 0-length row.names)

 

I am not sure what to do from here, am I getting close?

 

ADD REPLYlink written 3.0 years ago by mat14940

You are using tab$GeneID when you don't actually have such a column. So naturally i contains nothing and tab[i,] has zero rows. See my edit above.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Gordon Smyth39k

Haha thanks for pointing that out (as.character(tab$ENTREZID)), I missed it completely :X

this works, thank you!

ADD REPLYlink written 3.0 years ago by mat14940

Hey,

 

I have been struggling to emulate this with GO categories.  I am stuck b/c I do not know how to generate an "R" object containing all GO term annotations for "Dr" analogous to: GK <- getGeneKEGGLinks(species.KEGG = "dre"). Beyond that, would i <- tab$GeneID %in% GK$GeneID[GK$PathwayID=="path:dre01100"] then need to be changed to: i <- tab$GeneID %in% GK$GeneID[GK$TermID=="GO:0004984"] for gene ontologies?

 

 

Here is an example of five categories I am interested in.

  Term Ont N Up Down P.Up P.Down
GO:0004984 olfactory receptor activity MF 102 48 0 1.15E-30 1
GO:0050877 neurological system process BP 258 63 47 3.42E-21 0.010118
GO:0007601 visual perception BP 61 2 32 0.905847 1.7E-13
GO:0008066 glutamate receptor activity MF 22 0 18 1 4.65E-13
GO:0045202 synapse CC 167 2 68 0.999791 2.59E-19

 

entzzvec<-as.vector(data.fit.eb$genes$ENTREZID)
tab1 <- topTable(data.fit.eb,lfc=1.5,adjust="BH",p.value=0.01, n=Inf,coef=1)
??? GG<-GET GO TERM ANNOTATIONS ???
MOgo <- goana(data.fit.eb, coef=1,FDR=0.01,geneid=data.fit.eb$genes$ENTREZID,species="Dr")
topMOgo<-topGO(MOgo,number=50)
i <- tab1$GeneID %in% GG$GeneID[GG$TermID=="GO:0004984"]
tab1[i,]

 

 

 

 

 

Thanks,

Matt

ADD REPLYlink written 3.0 years ago by mat14940
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 396 users visited in the last hour