Search
Question: Change in pathway IDs scheme in reactome.db
1
gravatar for assaron
3 months ago by
assaron100
assaron100 wrote:

I just found out that in recent reactome.db version (1.59.0) there was a change of pathways IDs (PATHID column). Before there was 20800 pathways with pathways beeing specific to an organism. Now there are 2185 pathways, shared between organisms. I'm not sure what were the reasons for the change, but I don't have problems with both schemes. However, I find it inconsistent, that for a single pathway ID there are multiple pathway names.

With reactome.db 1.58.0:

> length(keys(reactome.db, keytype="PATHID"))
[1] 20800
> AnnotationDbi::select(reactome.db, keys=c("Bos taurus: Interleukin-6 signaling", "Homo sapiens: Interleukin-6 signaling"), columns=c("PATHID"), keytype=c("PATHNAME"))
'select()' returned 1:1 mapping between keys and columns
                               PATHNAME  PATHID
1   Bos taurus: Interleukin-6 signaling 5870529
2 Homo sapiens: Interleukin-6 signaling 1059683

 

With reactome.db 1.59.0:

> length(keys(reactome.db, keytype="PATHID"))
[1] 2185
> AnnotationDbi::select(reactome.db, keys=c("Bos taurus: Interleukin-6 signaling", "Homo sapiens: Interleukin-6 signaling"), columns=c("PATHID"), keytype=c("PATHNAME"))
'select()' returned 1:1 mapping between keys and columns
                               PATHNAME  PATHID
1   Bos taurus: Interleukin-6 signaling 1059683
2 Homo sapiens: Interleukin-6 signaling 1059683

As this behavior breaks build of my package (fgsea) and I need to fix it, I wonder is this transition final? If yes, may be it's better to remove organism specific prefix whatsoever? I'm pretty sure it's much better to have one-to-one map between pathway IDs and pathway names.

Best,
Alexey

 

 

ADD COMMENTlink modified 3 months ago by willem.ligtenberg130 • written 3 months ago by assaron100
0
gravatar for willem.ligtenberg
3 months ago by
Netherlands
willem.ligtenberg130 wrote:

No that change is not final.
I was alerted to that issue last weekend. I have already created a new version which results in a one-to-one mapping of pathway-id to pathway name.

Apparently, Reactome has changed their identifiers, and I have revised the way I am building the package.
I mistakenly thought that the R-HSA-<ID> thing was only in their exports, but they have made that change throughout the database and website. (When I started using Reactome the DB_IDs were always numeric.)
So I removed the R-<three letter species>- bit from all ids, which then resulted in the multi mapping issue.

When I was alerted to this, I now have made a package that does not remove the R-HSA- part. So all IDs are now of the form: R-<three letter abbreviation species>-<number>.
I then had the request to also include the old (full numeric) ids that were present previously, but I can only do that with up to date info for human, which I think is not what people would expect, since there are also users using Reactome with  non-human data.
Another option would be to include the mappings from the previous version, which would mean that that data is stale. (Also sub-optimal)

Any suggestions on the preferred way forward are welcome. I can build another version relatively quickly.

If you want to test that new version feel free to download it here.

ADD COMMENTlink modified 3 months ago • written 3 months ago by willem.ligtenberg130
1

I think the form  R-<three letter abbreviation species>-<number> is good enough. Not sure why would anyone want the ids to be numeric only. 

ADD REPLYlink written 3 months ago by assaron100

Backwards compatibility, if you have a report (knitr/sweave) that might break now, if you were using the database ids to specifically look at a pathway.
However, I do think that the main use case is starting at gene ids, finding related pathways (first pathway id, then the pathway name) that would still work normally.

ADD REPLYlink written 3 months ago by willem.ligtenberg130

I see. Well, as you say, the package just reflects reactome database, so I'd argue it's better to have ids consistent with current reactome ids, that is of the form like R-HSA-<ID>.

Btw, did I got it right that there is a new version coming very soon with unique ids?

 

ADD REPLYlink written 3 months ago by assaron100
1

Yes, you did get that right.
You can download it here if you want to test drive it:
https://owncloud.wligtenberg.nl/index.php/s/YfcW3XU19ptgrOR

ADD REPLYlink written 3 months ago by willem.ligtenberg130

When it should be available at bioconductor 3.5? It's a release date soon...

ADD REPLYlink written 3 months ago by assaron100

Did you verify this fixes your issue?
These annotation packages are not tracked in version control. If this fixes your issue, I will ask Valerie is she can upload the version that I have on my website.
 

ADD REPLYlink written 3 months ago by willem.ligtenberg130

Oh, I see. Yes, it does.

ADD REPLYlink written 3 months ago by assaron100

The annotation push script is run nightly so this should
appear in the repo tomorrow (April 21).

ADD REPLYlink written 3 months ago by willem.ligtenberg130

I still can't see the updated version in Bioconductor. Is this OK? There are still 2185 PATHID's in the packge. Also, should the reactome.db version change with this update, from 1.59.0 to 1.59.1 or something?

ADD REPLYlink written 3 months ago by assaron100

I didn't update the version number (maybe should have...)
 

ADD REPLYlink written 3 months ago by willem.ligtenberg130
1

Can you check with Valerie?  It'd be better with new version number, because it would be easier to check if it was updated or not.

ADD REPLYlink written 3 months ago by assaron100
1

I mailed Valerie an updated package with the version number increase.

ADD REPLYlink written 3 months ago by willem.ligtenberg130

mmm, version 60 of Reactome was released, should I run a quick update?

ADD REPLYlink written 3 months ago by willem.ligtenberg130
1
I'd wait with updates for after the release. It seems that even the updated version is not propagated yet. On Fri, Apr 21, 2017, 15:57 willem.ligtenberg [bioc] < noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User willem.ligtenberg <https: support.bioconductor.org="" u="" 6989=""/> wrote Comment: > Change in pathway IDs scheme in reactome.db > <https: support.bioconductor.org="" p="" 95022="" #95139="">: > > mmm, version 60 of Reactome was released, should I run a quick update? > > ------------------------------ > > Post tags: reactome.db, fgsea > > You may reply via email or visit > C: Change in pathway IDs scheme in reactome.db >
ADD REPLYlink written 3 months ago by assaron100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 296 users visited in the last hour