Search
Question: Discrepancy in gene set size between gage() and geneData()
0
gravatar for serpalma.v
21 months ago by
serpalma.v10
Germany
serpalma.v10 wrote:

Dear community, 

I have run GAGE and Pathview  on my microarray data. Everything worked fine.

However, the number of genes assigned to a kegg pathway after using the gage() function is different from the number of genes produced by the geneData() function, which is the function to use in order to extract the gene IDs assigned to a significantly enriched pathway.

For example, one pathway has 181 genes in the variable "set.size" after using gage(), but when that pathway is used in geneData() with the whole expression set, it returns 196. I tried two more pathways and similar results appeared 34 vs 37 and 15 vs 17. 

How can I know exactly which are the genes that are contained in the variable "set.size" after using gage()?

Many thanks!

 

 

ADD COMMENTlink modified 21 months ago • written 21 months ago by serpalma.v10
1
gravatar for Luo Weijun
21 months ago by
Luo Weijun1.4k
United States
Luo Weijun1.4k wrote:
geneData() just extract and/or visualizes the expression data for seleted gene set(s). Its output txt file should not have a column called set.size. You may count the row number as the set size (or number of genes mapped). But there can be duplicated entries (gene/transcript IDs or rownames in your input data) mapped to the same gene in the gene set. In otherwords, set.size column in gage() output count the unique genes present in your input data, but the number of row in geneData() may be bigger than that due to duplicated mapping. Hope this make sense.
ADD COMMENTlink written 21 months ago by Luo Weijun1.4k
0
gravatar for serpalma.v
21 months ago by
serpalma.v10
Germany
serpalma.v10 wrote:

Thanks Luo. I have checked the IDs in the "Gene" column of the table produced by geneData(). And as you said, some IDs are repeated. Once I apply unique(), I get the proper length

One more thing. I see more genes in the set.size column than genes in the actual pathway. For example, I get for "ssc04340 Hedgehog signaling pathway" significantly enriched with 34 genes, but when I look at the pathway I count much less than that, only 19.

Is this due to the fact that some genes are assigned to the same gene products? or  that they belong to another map associated with the pathway?

 

Cheers!

 

ADD COMMENTlink written 21 months ago by serpalma.v10
You are right on this. As described in the vignette, In native KEGG view, a gene node may represent multiple genes/proteins with similar or redundant functional role. The number of member genes range from 1 up to several tens.
ADD REPLYlink written 20 months ago by Luo Weijun1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 174 users visited in the last hour