Question

KEGGgraph parseKGML2Graph function

0

Entering edit mode

Cathy • 0

@7b0007c4

Last seen 20 months ago

United States

Hi

I am trying to use KEGGgraph to create an adjacency matrix of human KEGG pathways for my random walk model. I have performed the KEGG enrichment analyses and obtained a list of pathways (ex. hsa04612.xml). I then tried to use parseKGML2Graph to parse the KGML file into graph object. My question is about setting one of the parameters: expandGenes.

According to the User manual, if expandGenes = TRUE, the function will output a list of nodes that have unique KEGGID. These nodes also contained homologs of the gene product/proteins that are in the pathway. See example below:

> q = parseKGML2DataFrame('hsa04612.xml',expandGenes=T)
> q
        from       to  type             subtype
1    hsa:972  hsa:972 PPrel        state change
2   hsa:3108 hsa:3108 PPrel        state change
3   hsa:3108 hsa:3109 PPrel        state change
4   hsa:3108 hsa:3111 PPrel        state change
5   hsa:3108 hsa:3112 PPrel        state change
6   hsa:3108 hsa:3113 PPrel        state change

> g <- parseKGML2Graph('hsa04612.xml',expandGenes=T)
> nodes(g)
 [1] "hsa:972"       "hsa:3108"      "hsa:3109"     
 [4] "hsa:3111"      "hsa:3112"      "hsa:3113"     
 [7] "hsa:3115"      "hsa:3117"      "hsa:3118"     
[10] "hsa:3119"      "hsa:3122"      "hsa:3123"

If the expandGenes = FALSE, the nodes were in numeric (index) . When I look at the nodes data, one ID (hsa) had multiple genes in it.

> q = parseKGML2DataFrame('hsa04612.xml',expandGenes=F)
> q
   from to  type             subtype
1    18 17 PPrel        state change
2    19 17 PPrel          activation
3    23 21 PPrel          activation
4    23 22 PPrel          activation
5    23 20 PPrel          activation
6    24 18 PPrel          activation
7    25 27 PPrel          activation
8    26 27 PPrel        state change
9    27 18 PPrel        state change

> g <- parseKGML2Graph('hsa04612.xml',expandGenes=F)
> nodes(g)

 [1] "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "33"
[16] "34" "35" "36" "37" "38" "39" "40" "41" "42" "43" "44" "45" "46" "47" "48"
[31] "49" "50" "51" "52" "53" "54" "55" "56" "57"

#output for one of the nodes
`49`
KEGG Node (Entry '49'):
------------------------------------------------------------
[ displayName ]: TAP1, ABC17, ABCB2, APT1, D6S114E, PSF-1, PSF1, RING4, TAP1*0102N, TAP1N...
[ Name ]: hsa:6890,hsa:6891
[ Type ]: gene
[ Link ]: https://www.kegg.jp/dbget-bin/www_bget?hsa:6890+hsa:6891

Also, in the example, there was a sentence : '## only for expert use'

Is there a reason why expandGenes=FALSE is not suggested?

I am only interested in the genes/proteins that are in the pathway but since the hsa is not one to one any more. How should I proceed to make a adjacency matrix with the unique identifier?

Thank you very much for your help!

KEGGgraph • 592 views

ADD COMMENT • link updated 20 months ago by James W. MacDonald 66k • written 20 months ago by Cathy • 0

score 0 · Answer 1 · 2022-11-18

The original graph isn't one to one to begin with. Compare what you get when you do

library(Rgraphviz)
plot(parseKGML2Graph("hsa04612.xml"))
## versus
plot(parseKGML2Graph("hsa04612.xml", expandGenes = FALSE))

The second graph is a simplification of the first, and eliminates much of the existing complexity of the underlying pathway. I would imagine the 'for expert use' implies that one would need to be quite well acquainted with this subject matter to be able to use the simplified graph in lieu of the more complicated 'real' graph.