Search
Question: Discrepancy in gene set size between gage() and geneData()
0
gravatar for serpalma.v
2.4 years ago by
serpalma.v30
Germany
serpalma.v30 wrote:

Dear community, 

I have run GAGE and Pathview  on my microarray data. Everything worked fine.

However, the number of genes assigned to a kegg pathway after using the gage() function is different from the number of genes produced by the geneData() function, which is the function to use in order to extract the gene IDs assigned to a significantly enriched pathway.

For example, one pathway has 181 genes in the variable "set.size" after using gage(), but when that pathway is used in geneData() with the whole expression set, it returns 196. I tried two more pathways and similar results appeared 34 vs 37 and 15 vs 17. 

How can I know exactly which are the genes that are contained in the variable "set.size" after using gage()?

Many thanks!

 

 

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by serpalma.v30
1
gravatar for Luo Weijun
2.4 years ago by
Luo Weijun1.4k
United States
Luo Weijun1.4k wrote:
geneData() just extract and/or visualizes the expression data for seleted gene set(s). Its output txt file should not have a column called set.size. You may count the row number as the set size (or number of genes mapped). But there can be duplicated entries (gene/transcript IDs or rownames in your input data) mapped to the same gene in the gene set. In otherwords, set.size column in gage() output count the unique genes present in your input data, but the number of row in geneData() may be bigger than that due to duplicated mapping. Hope this make sense.
ADD COMMENTlink written 2.4 years ago by Luo Weijun1.4k
0
gravatar for serpalma.v
2.4 years ago by
serpalma.v30
Germany
serpalma.v30 wrote:

Thanks Luo. I have checked the IDs in the "Gene" column of the table produced by geneData(). And as you said, some IDs are repeated. Once I apply unique(), I get the proper length

One more thing. I see more genes in the set.size column than genes in the actual pathway. For example, I get for "ssc04340 Hedgehog signaling pathway" significantly enriched with 34 genes, but when I look at the pathway I count much less than that, only 19.

Is this due to the fact that some genes are assigned to the same gene products? or  that they belong to another map associated with the pathway?

 

Cheers!

 

ADD COMMENTlink written 2.4 years ago by serpalma.v30
You are right on this. As described in the vignette, In native KEGG view, a gene node may represent multiple genes/proteins with similar or redundant functional role. The number of member genes range from 1 up to several tens.
ADD REPLYlink written 2.4 years ago by Luo Weijun1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 161 users visited in the last hour