I have some questions regarding the functioning of the enrichPathway function.
I know that the default input requires that all genes should be 'translated' to ENTREZID coding. I have my data in UNIPROT code, and when I make the translation, almost 20% of the proteins fail to map.
I used http://www.uniprot.org/uploadlists/ and the bitr() function in R for mapping ID, and comparably they output a similar quantity of ´translations´, ending with a similar amount of identity loss. That is why I would like to make my enrichment analysis using Uniprot identities directly.
I would like to make the enrichPathway function to accept UNIPROT identities so I won't have that loss, but it came to my knowledge through Dr. Guangchuang Yu that this is not possible because the ReactomePA package relies on reactome.db and this last one is merely annotated in ENTREZID.
I tried to input my Uniprot identities directly in the Reactome.org Website and the number of enriched pathways is very similar, even ReactomePA enrich slightly more pathways. Nevertheless, some pathways are enriched by Reactome.org that are not enriched by ReactomePA and vice-versa, and the number of identities per enriched pathway is lower by ReactomePA (I assume, because of the loss by translation).
I am researching for options to evaluate how important (´significant´?) is this loss of information when I am doing pathway analyses because I know that no database is completely annotated and one should expect this kind of things to happen.
I would appreciate any advice that I could receive about how to approach this issue for my enrichment analysis, even recommendations of other software or enrichment databases.