Question

Enrichment Analysis for Shot-gun proteomics data using the ReactomePA package: problem with identities loss

0

Entering edit mode

Miguel.Cosenza ▴ 10

@miguelcosenza-14267

Last seen 3.4 years ago

Germany / Freiburg / University Medical…

Dear all,

I have some questions regarding the functioning of the enrichPathway function.

I know that the default input requires that all genes should be 'translated' to ENTREZID coding. I have my data in UNIPROT code, and when I make the translation, almost 20% of the proteins fail to map.

I used http://www.uniprot.org/uploadlists/ and the bitr() function in R for mapping ID, and comparably they output a similar quantity of ´translations´, ending with a similar amount of identity loss. That is why I would like to make my enrichment analysis using Uniprot identities directly.

I would like to make the enrichPathway function to accept UNIPROT identities so I won't have that loss, but it came to my knowledge through Dr. Guangchuang Yu that this is not possible because the ReactomePA package relies on reactome.db and this last one is merely annotated in ENTREZID.

I tried to input my Uniprot identities directly in the Reactome.org Website and the number of enriched pathways is very similar, even ReactomePA enrich slightly more pathways. Nevertheless, some pathways are enriched by Reactome.org that are not enriched by ReactomePA and vice-versa, and the number of identities per enriched pathway is lower by ReactomePA (I assume, because of the loss by translation).

I am researching for options to evaluate how important (´significant´?) is this loss of information when I am doing pathway analyses because I know that no database is completely annotated and one should expect this kind of things to happen.

I would appreciate any advice that I could receive about how to approach this issue for my enrichment analysis, even recommendations of other software or enrichment databases.

reactomepa reactome.db reactome proteomics uniprot accessions • 1.9k views

ADD COMMENT • link updated 6.5 years ago by willem.ligtenberg ▴ 150 • written 6.5 years ago by Miguel.Cosenza ▴ 10

score 1 · Answer 1 · 2017-10-26

1

Entering edit mode

willem.ligtenberg ▴ 150

@willemligtenberg-6989

Last seen 6.5 years ago

Netherlands

As the creator of reactome.db, I could support a uniprot to reactome pathway mapping in my package.
However, we are past feature freeze in this release cycle. This would mean that this addition would only be available in half a year. Obviously, I could make it available somewhere else, but it would technically be outside of bioconductor, which makes it difficult/weird for ReactomePA to be able to use this functionality.

I am fine with making this change, I think I could even do it today. However, I am not sure how it would help you right now.
Although if ReactomePA is willing to make a special version available to you at the moment as well, that might work.

ADD COMMENT • link 6.5 years ago willem.ligtenberg ▴ 150

1

Entering edit mode

Using the devel branch is not outside the bounds of user experience! I'd make the change to devel shortly after October 31, rather than trying to make it available through some non-standard manner.

ADD REPLY • link 6.5 years ago Martin Morgan 25k

1

Entering edit mode

That is a good suggestion, although I don't know how to make an annotation package available in devel. The are treated a but differently from the normal packages, as they are not in git. But I am sure, we can figure that out.

ADD REPLY • link 6.5 years ago willem.ligtenberg ▴ 150

0

Entering edit mode

First of all, many thanks for the feedback and support!

I am very glad to know that could be made. It would be really helpful as many proteomics software is outputting protein identification in Uniprot Accession or Uniprot ID. For me, I am at a starting point of my master research, in a pilot stage so any modification that could be done for the packages would be very useful for me to use it during the course of the project.

On the other hand, I have an update to make regarding the mapping ID tests that I am making.

I made my ID translation from 'Uniprot Accession' to 'Entrez ID' through the DAVID database (https://david.ncifcrf.gov/), and the loss was a lot more convenient: only 11 proteins failed to map from 2221 identifications.

Nevertheless, it would be of course very useful if one could make the enrichment analysis directly from Uniprot identifications.

ADD REPLY • link 6.5 years ago Miguel.Cosenza ▴ 10