I just found out that in recent reactome.db version (1.59.0) there was a change of pathways IDs (PATHID column). Before there was 20800 pathways with pathways beeing specific to an organism. Now there are 2185 pathways, shared between organisms. I'm not sure what were the reasons for the change, but I don't have problems with both schemes. However, I find it inconsistent, that for a single pathway ID there are multiple pathway names.
With reactome.db 1.58.0:
> length(keys(reactome.db, keytype="PATHID"))
[1] 20800
> AnnotationDbi::select(reactome.db, keys=c("Bos taurus: Interleukin-6 signaling", "Homo sapiens: Interleukin-6 signaling"), columns=c("PATHID"), keytype=c("PATHNAME"))
'select()' returned 1:1 mapping between keys and columns
PATHNAME PATHID
1 Bos taurus: Interleukin-6 signaling 5870529
2 Homo sapiens: Interleukin-6 signaling 1059683
With reactome.db 1.59.0:
> length(keys(reactome.db, keytype="PATHID"))
[1] 2185
> AnnotationDbi::select(reactome.db, keys=c("Bos taurus: Interleukin-6 signaling", "Homo sapiens: Interleukin-6 signaling"), columns=c("PATHID"), keytype=c("PATHNAME"))
'select()' returned 1:1 mapping between keys and columns
PATHNAME PATHID
1 Bos taurus: Interleukin-6 signaling 1059683
2 Homo sapiens: Interleukin-6 signaling 1059683
As this behavior breaks build of my package (fgsea) and I need to fix it, I wonder is this transition final? If yes, may be it's better to remove organism specific prefix whatsoever? I'm pretty sure it's much better to have one-to-one map between pathway IDs and pathway names.
Best,
Alexey
I think the form R-<three letter abbreviation species>-<number> is good enough. Not sure why would anyone want the ids to be numeric only.
Backwards compatibility, if you have a report (knitr/sweave) that might break now, if you were using the database ids to specifically look at a pathway.
However, I do think that the main use case is starting at gene ids, finding related pathways (first pathway id, then the pathway name) that would still work normally.
I see. Well, as you say, the package just reflects reactome database, so I'd argue it's better to have ids consistent with current reactome ids, that is of the form like R-HSA-<ID>.
Btw, did I got it right that there is a new version coming very soon with unique ids?
Yes, you did get that right.
You can download it here if you want to test drive it:
https://owncloud.wligtenberg.nl/index.php/s/YfcW3XU19ptgrOR
When it should be available at bioconductor 3.5? It's a release date soon...
Did you verify this fixes your issue?
These annotation packages are not tracked in version control. If this fixes your issue, I will ask Valerie is she can upload the version that I have on my website.
Oh, I see. Yes, it does.
The annotation push script is run nightly so this should
appear in the repo tomorrow (April 21).
I still can't see the updated version in Bioconductor. Is this OK? There are still 2185 PATHID's in the packge. Also, should the reactome.db version change with this update, from 1.59.0 to 1.59.1 or something?
I didn't update the version number (maybe should have...)
Can you check with Valerie? It'd be better with new version number, because it would be easier to check if it was updated or not.
I mailed Valerie an updated package with the version number increase.
mmm, version 60 of Reactome was released, should I run a quick update?