Question

Urgent: Do GAGE applicable for non model species?

0

Entering edit mode

jackipchiho ▴ 10

@jackipchiho-8282

Last seen 10.4 years ago

Hong Kong

Hi ALL,

Since I am working on non model species green mussel and intertidal whelk, I would like to know can I use GAGE to perform gene set analysis for control and chemical treatment groups (n=2 / n=3)? If yes, how can I do it?

I have done de novo transcriptome and worked out the KEGG (Ko and K number without specific species) and GO (GO number without specific species). Is that possible to customize my own gene annotation file and perform available analysis in GAGE?

Can anyone help me about that? I have read the GAGE and pathview doc, but I cannot figure out the way to do so. I am running out of time to finish my MPhil study ><".

Many thanks,

Jack

gage • 4.0k views

ADD COMMENT • link updated 10.6 years ago by amywingsze • 0 • written 10.6 years ago by jackipchiho ▴ 10

0

Entering edit mode

Dear Jack,

I think I'm on the same track as yours. Can you please tell me how did you attach the KO number to the genes from non-model species? Thank you very much.

Sincerely,

Kang

ADD REPLY • link 8.9 years ago dukecomeback • 0

0

Entering edit mode

bioDBNet (https://biodbnet-abcc.ncifcrf.gov/) turns out help me really a lot!

ADD REPLY • link 8.9 years ago dukecomeback • 0

score 0 · Answer 1 · 2015-06-27

0

Entering edit mode

Luo Weijun ★ 1.6k

@luo-weijun-1783

Last seen 2.6 years ago

United States

I don’t think that your research species are part of KEGG at this time. however, if you label your genes with ko number, then you can generate KO gene set data first: kg.ko=kegg.gsets("ko") kegg.gs=kg.ko$kg.sets[kg.ko$sigmet.idx] you can then follow the gage examples to do the KEGG pathway analysis. Note that the row names (or names) of your gene data matrix need to be ko numbers. To check the gene IDs in ko numbers fpr both the gene set and your data, you can do: lapplykegg.gs[1:3], head, 3) head(your.data) GO analysis is similar if you can generate your GO gene set data with the gene IDs you have.

ADD COMMENT • link 10.6 years ago Luo Weijun ★ 1.6k

0

Entering edit mode

I have a follow-up question about this approach.

I ran KAAS on my non-model organism, and I understand your suggestion above changing the gene IDs (i.e. the row names of an expression data matrix) to the KEGG Orthology (KO) numbers assigned by KAAS before running gage.

Since the multiple genes might be assigned to the same KO number, what will happen if the input (exprs argument) to gage has non-unique row names)?

ADD REPLY • link 6.7 years ago psutton • 0

score 0 · Answer 2 · 2015-06-28

I have created a kegg gene set (Rc_kegg.txt) based on my own annotation data:

Ko00010 Glycolysis / Gluconeogenesis comp10418997_c0 comp11135183_c0 comp11547_c0 comp124788_c0 ......

Ko00020 Citrate cycle (TCA cycle) comp106429_c0 comp11438380_c0 comp11852469_c0 comp124788_c0 ........

Ko00030 Pentose phosphate pathway comp11085364_c0 comp11547_c0 comp17167857_c0 comp17594_c1 .......

And the expression data (Rc_30BgF.txt)

CgF1 CgF2 CgF3 BgF1 BgF2 BgF3

comp579412_c0   9.88   11.7   13.89   9.42   9.74 8.95
comp579413_c0   0.59   1.74   0.23   2.91   0.69 1.56
comp579422_c0   7.26   3.58   8.02   1.84   6.2 3.6

Is that okay for perform GAGE? I don't how to create my own kegg gene set in R, could you use the above file name to demonstrate once?

Many thanks

score 0 · Answer 3 · 2015-06-28

0

Entering edit mode

jackipchiho ▴ 10

@jackipchiho-8282

Last seen 10.4 years ago

Hong Kong

I tried the following way to do the analysis based on my own kegg gene set data:

> filename=system.file("extdata/Rckegg.gmt", package = "gage")
> Rckegg.gs=readList(filename)

> data(Rc_30BgF)
> Cg=1:3
> Bg=4:6
> Rc.kegg.unp <- gage(Rc_30BgF, gsets = Rckegg.gs, ref = Cg, samp = Bg, compared = "unpaired")

Is that okay to work in this way?

Many thanks

ADD COMMENT • link 10.6 years ago jackipchiho ▴ 10

0

Entering edit mode

Gage provide an function, readList(), to read in gene data in gmt file, and readExpData() function to read in expression data. Check tutorial dataPrep.pdf for details. You were using the right function, readList(). however, you cannot copy and paste the demo code in the tutorials literally. The following lines load data pre-existing with the gage package, and won’t work for you own data. filename=system.file("extdata/Rckegg.gmt", package = "gage") … data(Rc_30BgF) you should specify the full path + file name to your data files, and use readList() and readExpData() (or read.table() R function) to read in your gene list and expression data. You may want to check the detailed info for these functions too. These are R basics and not specific to gage package, you should really get yourself familiar with R to use gage or any other package/utilities in R.

ADD REPLY • link 10.6 years ago Luo Weijun ★ 1.6k

0

Entering edit mode

Thanks Weijun, I have revised the code:

>list.files()

"Rckegg.gs.gmt" "Rcg30.txt"

>Rckegg.gs <- readList("Rckegg.gs.gmt")

>headRckegg.gs[1:2])

$Ko00010
[1] "comp10418997_c0" "comp11135183_c0" "comp11547_c0" .......

$Ko00020
[1] "comp106429_c0" "comp11438380_c0" "comp11852469_c0" "comp124788_c0" ........

>Rcg30 <- readExpData("Rcg30", row.names=1)

>head(Rcg30)

Cg1 Cg2 Cg3 Bg1 Bg2 Bg3 Pg1 Pg2 Pg3
comp10000024_c0 1 2 3 4 5 6 7 8 9
comp10000066_c0 1 2 3 4 5 6 7 8 9

am I using the right function of readList and readExpData?

Besides, in my expt. I have one control and two treatment groups with two time-point in four types of tissue samples.

If I would like to reveal the toxicity mechanism/pathway between (1) two chemicals or (2) exposure time (time point 1 and time point 2) in same chemical, do I run gage for control vs treatment 1 and control vs treatment 2 one by one? total 16 comparison and compare (1) treatments and (2) time point using gagaComp. or use gagePipe and gageComp to do it?

Do gageComp able to compare (control vs treatment 1) and (control vs treatment 2) or time point 1 vs time point 2?

Thanks for your great help, since I have the first student to work with NGS in my Lab, I need to try in diff. way and seek help for expert ^ ^, like Weijun, to correct me.

Many thanks,

ADD REPLY • link 10.6 years ago jackipchiho ▴ 10

score 0 · Answer 4 · 2015-06-29

Hi Weijun,

I am facing exactly the same problem as Jack does! The plant species that I am working on is not a model species and also distantly related to Arabidopsis. I have annotated my de novo assembled transcriptome with Arabidopsis database and ko number. I would like to use the KO gene set data for enrichment analysis as you mentioned above. But there are multiple transcripts/isoforms that are annotated with the same ko number based on homology search. Would gage accept input that has same rowname(ko number) appearing more than once in the expression data file? Or should I just added up the FPKM value of those with the same ko number?

Thanks for your help in advance!

Amy.