Dear all,
I have some questions regarding the functioning of the enrichPathway function.
I know that the default input requires that all genes should be 'translated' to ENTREZID coding. I have my data in UNIPROT code, and when I make the translation, almost 20% of the proteins fail to map.
I used http://www.uniprot.org/uploadlists/ and the bitr() function in R for mapping ID, and comparably they output a similar quantity of ´translations´, ending with a similar amount of identity loss. That is why I would like to make my enrichment analysis using Uniprot identities directly.
I would like to make the enrichPathway function to accept UNIPROT identities so I won't have that loss, but it came to my knowledge through Dr. Guangchuang Yu that this is not possible because the ReactomePA package relies on reactome.db and this last one is merely annotated in ENTREZID.
I tried to input my Uniprot identities directly in the Reactome.org Website and the number of enriched pathways is very similar, even ReactomePA enrich slightly more pathways. Nevertheless, some pathways are enriched by Reactome.org that are not enriched by ReactomePA and vice-versa, and the number of identities per enriched pathway is lower by ReactomePA (I assume, because of the loss by translation).
I am researching for options to evaluate how important (´significant´?) is this loss of information when I am doing pathway analyses because I know that no database is completely annotated and one should expect this kind of things to happen.
I would appreciate any advice that I could receive about how to approach this issue for my enrichment analysis, even recommendations of other software or enrichment databases.
Using the devel branch is not outside the bounds of user experience! I'd make the change to devel shortly after October 31, rather than trying to make it available through some non-standard manner.
That is a good suggestion, although I don't know how to make an annotation package available in devel. The are treated a but differently from the normal packages, as they are not in git. But I am sure, we can figure that out.
First of all, many thanks for the feedback and support!
I am very glad to know that could be made. It would be really helpful as many proteomics software is outputting protein identification in Uniprot Accession or Uniprot ID. For me, I am at a starting point of my master research, in a pilot stage so any modification that could be done for the packages would be very useful for me to use it during the course of the project.
On the other hand, I have an update to make regarding the mapping ID tests that I am making.
I made my ID translation from 'Uniprot Accession' to 'Entrez ID' through the DAVID database (https://david.ncifcrf.gov/), and the loss was a lot more convenient: only 11 proteins failed to map from 2221 identifications.
Nevertheless, it would be of course very useful if one could make the enrichment analysis directly from Uniprot identifications.