In the Path2PPI
package vignette, section 2.1., the list of proteins associated with the pathway of interest (i.e. "autophagy induction") for yeast and human are provided in two named character vectors “yeast.ai.proteins” and "human.ai.proteins”.
I found the detail for how one could do this for the organism/pathway of their choosing rather lacking - it would be nice to have some more detail as to how one could implement this themselves.
1) How does one find the proteins which are associated with a pathway of interest for reference species? I thought that one could find this information from the BioCyc Pathway Database Collection but when I type "autophagy induction" in the search bar I get no results. Where did the package developer get this information? I assume once you had either the HGNC gene symbol and/or protein identifier (e.g. Uniprot, Ensembl, SwissProt) you could easily use BiomaRt to get the other information and put it into a character vector like so:
> head(human.ai.proteins) P42345 O75385 Q8IYT8 Q6PHR2 O75143 "MTOR" "ULK1" "ULK2" "ULK3" "ATG13"
2) The next step the vignette states that "these protein of interest are applied to find relevant interaction in the corresponding species iRefIndex file...only a very small part of the iRefIndex file for yeast and human are provided which contain the relevant interaction necessary for this tutorial" (e.g. “human.ai.irefindex” and “yeast.ai.irefindex”). Once again, I'm left scratching my head as to how one would do this? I didn't download the entire iRefIndex file to look at it yet but perhaps one can use dpylr
's inner_join()
to query? (e.g. pseudo-code example):
>library(dplyr) >p <- head(human.ai.proteins) >human.ai.proteins <- as.data.frame(p) >names(human.ai.proteins) <- "Gene_symbol" >human.ai.irefindex <- iRefIndex %>% inner_join(human.ai.proteins, by="Gene_symbol")
Your help would be greatly appreciated