Question: Discrepancy in gene set size between gage() and geneData()
0
3.5 years ago by
serpalma.v40
Germany
serpalma.v40 wrote:

Dear community,

I have run GAGE and Pathview  on my microarray data. Everything worked fine.

However, the number of genes assigned to a kegg pathway after using the gage() function is different from the number of genes produced by the geneData() function, which is the function to use in order to extract the gene IDs assigned to a significantly enriched pathway.

For example, one pathway has 181 genes in the variable "set.size" after using gage(), but when that pathway is used in geneData() with the whole expression set, it returns 196. I tried two more pathways and similar results appeared 34 vs 37 and 15 vs 17.

How can I know exactly which are the genes that are contained in the variable "set.size" after using gage()?

Many thanks!

modified 3.5 years ago • written 3.5 years ago by serpalma.v40
Answer: Discrepancy in gene set size between gage() and geneData()
1
3.5 years ago by
Luo Weijun1.5k
United States
Luo Weijun1.5k wrote:
geneData() just extract and/or visualizes the expression data for seleted gene set(s). Its output txt file should not have a column called set.size. You may count the row number as the set size (or number of genes mapped). But there can be duplicated entries (gene/transcript IDs or rownames in your input data) mapped to the same gene in the gene set. In otherwords, set.size column in gage() output count the unique genes present in your input data, but the number of row in geneData() may be bigger than that due to duplicated mapping. Hope this make sense.
Answer: Discrepancy in gene set size between gage() and geneData()
0
3.5 years ago by
serpalma.v40
Germany
serpalma.v40 wrote:

Thanks Luo. I have checked the IDs in the "Gene" column of the table produced by geneData(). And as you said, some IDs are repeated. Once I apply unique(), I get the proper length

One more thing. I see more genes in the set.size column than genes in the actual pathway. For example, I get for "ssc04340 Hedgehog signaling pathway" significantly enriched with 34 genes, but when I look at the pathway I count much less than that, only 19.

Is this due to the fact that some genes are assigned to the same gene products? or  that they belong to another map associated with the pathway?

Cheers!