I applied “Pathview” to overlay my gene expression data with a KEGG pathway (hsa05200). In this pathway, there are significant amount of genes and It seems one gene symbol on the graph can represent many genes. For instance, if i move cursor over “HDAC” on the graph, 2 gene symbols pop up (e.g HDAC1,HDAC2 ). I have expression values for these genes as HDAC1=–2.5 and HDAC2=3.2 and when I apply pathview I see HDAC1 in the Pathway but not HDAC2(since HDAC2 is largest).Is it a bug?How can I display a gene which has highest value when there are multiple genes?
This is not a bug. In page 10 of pathview tutorial:
Note in native KEGG view, a gene node may represent multiple genes/proteins with similar or redundant functional role. The number of member genes range from 1 up to several tens. They are intentionally put together as a single node on pathway graphs for better clarity and readability. Therefore, we do not split node and mark each member genes separately by default. But rather we visualize the node-wise data by summarize gene-wise data, users may specify the summarization method using node.sum arguement. Poential options include "sum","mean", "median",
"max", "max.abs" and "random". Default node.sum="sum", and you can use "max" in your case.
yes,I added "max" in my case but i still see HDAC1 name in the KEGG graph box having the HDAC2 max value.I understand that node.sum is looking for "max" value when they have multiple genes but they don't print the gene name that has max value.
yes,I added "max" in my case but i still see HDAC1 name in the KEGG graph box having the HDAC2 max value.I understand that node.sum is looking for "max" value when they have multiple genes but they don't print the gene name that has max value.
You are right. All nodes with multiple genes mapped are labeled with the most representative protein/gene name. we don’t use the gene names with the maximal expression level or change. this way make more sense for most summary methods other than "max", like "sum","mean", "median" etc.
Thank you for quick reply!!
yes,I added "max" in my case but i still see HDAC1 name in the KEGG graph box having the HDAC2 max value.I understand that node.sum is looking for "max" value when they have multiple genes but they don't print the gene name that has max value.