Discrepancy in gene set size between gage() and geneData()
2
0
Entering edit mode
serpalma.v ▴ 60
@serpalmav-8912
Last seen 2.8 years ago
Germany

Dear community, 

I have run GAGE and Pathview  on my microarray data. Everything worked fine.

However, the number of genes assigned to a kegg pathway after using the gage() function is different from the number of genes produced by the geneData() function, which is the function to use in order to extract the gene IDs assigned to a significantly enriched pathway.

For example, one pathway has 181 genes in the variable "set.size" after using gage(), but when that pathway is used in geneData() with the whole expression set, it returns 196. I tried two more pathways and similar results appeared 34 vs 37 and 15 vs 17. 

How can I know exactly which are the genes that are contained in the variable "set.size" after using gage()?

Many thanks!

 

 

gage gene set analysis pathview kegg pathway analysis • 1.8k views
ADD COMMENT
1
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 17 months ago
United States
geneData() just extract and/or visualizes the expression data for seleted gene set(s). Its output txt file should not have a column called set.size. You may count the row number as the set size (or number of genes mapped). But there can be duplicated entries (gene/transcript IDs or rownames in your input data) mapped to the same gene in the gene set. In otherwords, set.size column in gage() output count the unique genes present in your input data, but the number of row in geneData() may be bigger than that due to duplicated mapping. Hope this make sense.
ADD COMMENT
0
Entering edit mode
serpalma.v ▴ 60
@serpalmav-8912
Last seen 2.8 years ago
Germany

Thanks Luo. I have checked the IDs in the "Gene" column of the table produced by geneData(). And as you said, some IDs are repeated. Once I apply unique(), I get the proper length

One more thing. I see more genes in the set.size column than genes in the actual pathway. For example, I get for "ssc04340 Hedgehog signaling pathway" significantly enriched with 34 genes, but when I look at the pathway I count much less than that, only 19.

Is this due to the fact that some genes are assigned to the same gene products? or  that they belong to another map associated with the pathway?

 

Cheers!

 

ADD COMMENT
0
Entering edit mode
You are right on this. As described in the vignette, In native KEGG view, a gene node may represent multiple genes/proteins with similar or redundant functional role. The number of member genes range from 1 up to several tens.
ADD REPLY

Login before adding your answer.

Traffic: 626 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6