I tried to analyze PI3K-Akt pathway data from KEGG by Graphite package. But I noticed there are some differences between pathway paradigm on KEGG website http://www.genome.jp/kegg-bin/show_pathway?hsa04151 and downloaded data.
For instance, in paradigm, there is a repression path from FOXO to CCND1. But there is no repression path in data (searched by which(pkegg@edges$type == "Process(repression)")). And there is directed path in data between CCND1 and CDK, which is not presented in paradigm.
Have you guys met this kind of inconsistent problem before?
graphite developers here. We have checked the conversion of the pathway you mention.
In the first case (repression from FOXO to CCND1), the KGML file corresponding to the pathway doesn't include such reaction. We have already observed a number of cases in which the graphical plot does not correspond exactly to its KGML description. This tend to happen, as is the case here, whenever you have DNA as an intermediate entity. As we rely only on KGML, our software cannot recovere that edge.
The second interaction, linking CCND1 and CDK, is marked in graphite as a "binding" edge. In the original pathway we found that CDK forms a complex with Cyclin, which is itself formed by CCDN1 and other genes. When graphite resolves such groups (that is, single KEGG nodes composed by multiple genes) it introduces binding edges among all their constituent parts. As a result, you won't find these edges in the original pathway.
Thank you very much for the explanation! May I ask what is the rationale behind the treatment that if a group member of a group (protein complex) could be a gene product for multiple genes, there are binding edges for each pair of genes within this group member and across two group members, e.g. in GPCR and G\beta\gamma in PI3K-Akt pathway, all genes for GPCR and G\beta\gamma are interconnected forming a big cluster. Why didn't you do so for entries contains multiple genes connected by directed path?
Dear Enrica & Gabriele,
Thank you very much for the explanation! May I ask what is the rationale behind the treatment that if a group member of a group (protein complex) could be a gene product for multiple genes, there are binding edges for each pair of genes within this group member and across two group members, e.g. in GPCR and G\beta\gamma in PI3K-Akt pathway, all genes for GPCR and G\beta\gamma are interconnected forming a big cluster. Why didn't you do so for entries contains multiple genes connected by directed path?
Thank you,
Haoran