Question

Discrepancy in gene set size between gage() and geneData()

0

Entering edit mode

serpalma.v ▴ 60

@serpalmav-8912

Last seen 2.3 years ago

Germany

Dear community,

I have run GAGE and Pathview on my microarray data. Everything worked fine.

However, the number of genes assigned to a kegg pathway after using the gage() function is different from the number of genes produced by the geneData() function, which is the function to use in order to extract the gene IDs assigned to a significantly enriched pathway.

For example, one pathway has 181 genes in the variable "set.size" after using gage(), but when that pathway is used in geneData() with the whole expression set, it returns 196. I tried two more pathways and similar results appeared 34 vs 37 and 15 vs 17.

How can I know exactly which are the genes that are contained in the variable "set.size" after using gage()?

Many thanks!

gage gene set analysis pathview kegg pathway analysis • 1.6k views

ADD COMMENT • link 8.2 years ago serpalma.v ▴ 60

score 1 · Answer 1 · 2016-02-23

geneData() just extract and/or visualizes the expression data for seleted gene set(s). Its output txt file should not have a column called set.size. You may count the row number as the set size (or number of genes mapped). But there can be duplicated entries (gene/transcript IDs or rownames in your input data) mapped to the same gene in the gene set. In otherwords, set.size column in gage() output count the unique genes present in your input data, but the number of row in geneData() may be bigger than that due to duplicated mapping. Hope this make sense.

score 0 · Answer 2 · 2016-02-24

0

Entering edit mode

serpalma.v ▴ 60

@serpalmav-8912

Last seen 2.3 years ago

Germany

Thanks Luo. I have checked the IDs in the "Gene" column of the table produced by geneData(). And as you said, some IDs are repeated. Once I apply unique(), I get the proper length.

One more thing. I see more genes in the set.size column than genes in the actual pathway. For example, I get for "ssc04340 Hedgehog signaling pathway" significantly enriched with 34 genes, but when I look at the pathway I count much less than that, only 19.

Is this due to the fact that some genes are assigned to the same gene products? or that they belong to another map associated with the pathway?

Cheers!

ADD COMMENT • link 8.2 years ago serpalma.v ▴ 60

0

Entering edit mode

You are right on this. As described in the vignette, In native KEGG view, a gene node may represent multiple genes/proteins with similar or redundant functional role. The number of member genes range from 1 up to several tens.

ADD REPLY • link 8.2 years ago Luo Weijun ★ 1.6k