Thanks for the reactome.db package and the effort to maintain it, I find it extremely useful and that's why I am using it so much.
In the website of Reactome for a gene I am interested in I can see it is part of R-HSA-6805546 a set of "Keratin type II, epithelial [cytosol]", which is part of "Keratin type II [cytosol]", which is part of a reaction .
However using reactome.db
I couldn't find any REACTOMEID associated with it. (Making a minimal reproducible example I found a couple of bugs)
> library(reactome.db) > select(reactome.db, key = "3855", keytype = "ENTREZID", columns = "REACTOMEID") Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments. > mapIds(reactome.db, key = "3855", keytype = "ENTREZID", column = "REACTOMEID") Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments. > keytypes(reactome.db) [1] "ENTREZID" "GO" "PATHID" "PATHNAME" "REACTOMEID" > library("org.Hs.eg.db") > entrezids <- keys(org.Hs.eg.db, keytype = "ENTREZID") > a <- select(reactome.db, key = entrezids, keytype = "ENTREZID", columns = "REACTOMEID") 'select()' returned 1:many mapping between keys and columns > a[a$ENTREZID == "3855", ] ENTREZID REACTOMEID 19433 3855 <NA> R version 3.3.2 (2016-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.5 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=es_ES.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] org.Hs.eg.db_3.4.0 reactome.db_1.58.0 AnnotationDbi_1.36.2 [4] IRanges_2.8.1 S4Vectors_0.12.1 Biobase_2.34.0 [7] BiocGenerics_0.20.0 loaded via a namespace (and not attached): [1] DBI_0.5-1 memoise_1.0.0 Rcpp_0.12.9 RSQLite_1.1-2 digest_0.6.12
I know that the package has some difficulties (A: Unable to see reactome.db dbschema) but, how are the REACTOMEID obtained?
Could all genes be mapped to the reactions (if they are involved in any) even if a gene participates in a reaction through a complex or a set ?
In the database schema there are some tables describing sets and complex entities and reactions. This might be the way to annotate those to pathway. But I don't know if the data is in the package or not.
Many thanks for your fast response. Hope to hear news soon
Hi, can you please try the following package and see if it works?
https://owncloud.wligtenberg.nl/index.php/s/zxb0t7hq2wt6BzM
I couldn't use reactome.db (it is not exported on the NAMESPACE, but maybe should it be generated somehow in zzz.R), neither the reactome_dbInfo() work but now it seems that genes are correctly mapped to the pathways:
If you need more help testing it, I am glad to help.
Thank you for the feedback, I will look into those issues.
I noted some other changes on the new version which I don't know which is the cause, maybe the mapping or the updated of the data.
There are now less genes in reactomeEXTID2PATHID, from 65445 there were previously to 27956 genes, also from 17151 pathways(?) in the previous version now there are 1932.
In the reactomePATHNAME2ID now there are 21998 pathways but 9 of the pathways of reactomeEXTID2PATHID are not there and there are 255 pathways annotated in pathNAME2ID not annotated in EXTID2PATHID.
In the latest version :
reactomeEXTID2PATHID reactomeGO2REACTOMEID reactomePATHID2EXTID
27956 1791 1932
reactomePATHID2NAME reactomePATHNAME2ID reactomeREACTOMEID2GO
2176 21968 10554
Previously:
> reactomeMAPCOUNTS
reactomeEXTID2PATHID reactomeGO2REACTOMEID reactomePATHID2EXTID
65445 1796 17151
reactomePATHID2NAME reactomePATHNAME2ID reactomeREACTOMEID2GO
20800 20767 51734
Ah, yes, this was what I was afraid of. (I didn't have time to check it myself yesterday.
I switched from using a Perl script that did the mapping of genes to reaction/pathways, to the mappings provided on their website.
My first hunch would be that they limit their mappings to human genes, and I did not.
The only issue is, that my mappings were inconsistent with their mappings, which is annoying for the end user.
The mapping of genes to reactions is not trivial in Reactome, since their are many different objects used to model the various reactions.
Ah well, some more work for me to do then. :)