Search
Question: Enrichment Analysis for Shot-gun proteomics data using the ReactomePA package: problem with identities loss
0
gravatar for Miguel.Cosenza
26 days ago by
Brazil/Minas Gerais/Universidade Federal de Ouro Preto
Miguel.Cosenza10 wrote:

Dear all, 

I have some questions regarding the functioning of the enrichPathway function. 

I know that the default input requires that all genes should be 'translated' to ENTREZID coding. I have my data in UNIPROT code, and when I make the translation, almost 20% of the proteins fail to map.  

I used  http://www.uniprot.org/uploadlists/ and the bitr() function in R for mapping ID, and comparably they output a similar quantity of ´translations´, ending with a similar amount of identity loss. That is why I would like to make my enrichment analysis using Uniprot identities directly. 

I would like to make the enrichPathway function to accept UNIPROT identities so I won't have that loss, but it came to my knowledge through Dr. Guangchuang Yu that this is not possible because the ReactomePA package relies on reactome.db and this last one is merely annotated in ENTREZID. 

I tried to input my Uniprot identities directly in the Reactome.org Website and the number of enriched pathways is very similar, even ReactomePA enrich slightly more pathways. Nevertheless, some pathways are enriched by Reactome.org that are not enriched by ReactomePA and vice-versa, and the number of identities per enriched pathway is lower by ReactomePA (I assume, because of the loss by translation). 

I am researching for options to evaluate how important (´significant´?) is this loss of information when I am doing pathway analyses because I know that no database is completely annotated and one should expect this kind of things to happen.

I would appreciate any advice that I could receive about how to approach this issue for my enrichment analysis, even recommendations of other software or enrichment databases.  

ADD COMMENTlink modified 26 days ago by willem.ligtenberg150 • written 26 days ago by Miguel.Cosenza10
1
gravatar for willem.ligtenberg
26 days ago by
Netherlands
willem.ligtenberg150 wrote:

As the creator of reactome.db, I could support a uniprot to reactome pathway mapping in my package.
However, we are past feature freeze in this release cycle. This would mean that this addition would only be available in half a year. Obviously, I could make it available somewhere else, but it would technically be outside of bioconductor, which makes it difficult/weird for ReactomePA to be able to use this functionality.

I am fine with making this change, I think I could even do it today. However, I am not sure how it would help you right now.
Although if ReactomePA is willing to make a special version available to you at the moment as well, that might work.

ADD COMMENTlink written 26 days ago by willem.ligtenberg150
1

Using the devel branch is not outside the bounds of user experience! I'd make the change to devel shortly after October 31, rather than trying to make it available through some non-standard manner.

ADD REPLYlink written 26 days ago by Martin Morgan ♦♦ 20k
1

That is a good suggestion, although I don't know how to make an annotation package available in devel. The are treated a but differently from the normal packages, as they are not in git. But I am sure, we can figure that out.
 

ADD REPLYlink written 26 days ago by willem.ligtenberg150

First of all, many thanks for the feedback and support!

I am very glad to know that could be made. It would be really helpful as many proteomics software is outputting protein identification in Uniprot Accession or Uniprot ID. For me, I am at a starting point of my master research, in a pilot stage so any modification that could be done for the packages would be very useful for me to use it during the course of the project.

 

On the other hand, I have an update to make regarding the mapping ID tests that I am making. 

I made my ID translation from 'Uniprot Accession' to 'Entrez ID' through the DAVID database (https://david.ncifcrf.gov/), and the loss was a lot more convenient: only 11 proteins failed to map from 2221 identifications.

Nevertheless, it would be of course very useful if one could make the enrichment analysis directly from Uniprot identifications. 

ADD REPLYlink written 26 days ago by Miguel.Cosenza10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 285 users visited in the last hour