Misleading information in reactome.db
Entering edit mode
Last seen 15 days ago
European Union

Thanks for the reactome.db package and the effort to maintain it, I find it extremely useful and that's why I am using it so much.

In the website of Reactome for a gene I am interested in I can  see  it is part of R-HSA-6805546 a set of "Keratin type II, epithelial [cytosol]", which is part of "Keratin type II [cytosol]", which is part of a reaction .

However using reactome.db I couldn't find any REACTOMEID associated with it. (Making a minimal reproducible example I found a couple of bugs)

> library(reactome.db)
> select(reactome.db, key = "3855", keytype = "ENTREZID", columns = "REACTOMEID")
Error in .testForValidKeys(x, keys, keytype, fks) :
  None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.

> mapIds(reactome.db, key = "3855", keytype = "ENTREZID", column = "REACTOMEID")
Error in .testForValidKeys(x, keys, keytype, fks) :
  None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.

> keytypes(reactome.db)
[1] "ENTREZID"   "GO"         "PATHID"     "PATHNAME"   "REACTOMEID"

> library("org.Hs.eg.db")

> entrezids <- keys(org.Hs.eg.db, keytype = "ENTREZID")
> a <- select(reactome.db, key = entrezids, keytype = "ENTREZID", columns = "REACTOMEID")
'select()' returned 1:many mapping between keys and columns

> a[a$ENTREZID == "3855",  ]
19433     3855       <NA>

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C             
[3] LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8   
[7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                
[9] LC_ADDRESS=C               LC_TELEPHONE=C           

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] org.Hs.eg.db_3.4.0   reactome.db_1.58.0   AnnotationDbi_1.36.2
[4] IRanges_2.8.1        S4Vectors_0.12.1     Biobase_2.34.0     
[7] BiocGenerics_0.20.0

loaded via a namespace (and not attached):
[1] DBI_0.5-1     memoise_1.0.0 Rcpp_0.12.9   RSQLite_1.1-2 digest_0.6.12


I know that the package has some difficulties (A: Unable to see reactome.db dbschema) but, how are the REACTOMEID obtained?

Could all genes be mapped to the reactions (if they are involved in any) even if a gene participates in a reaction through a complex or a set ?

In the database schema there are some tables describing sets and complex entities and reactions. This might be the way to annotate those to pathway. But I don't know if the data is in the package or not.

reactome.db • 1.5k views
Entering edit mode
Last seen 4.9 years ago

As the maintainer of reactome.db, I am sorry for the inconsistencies.
Currently, the reactome.db package is generated using a Perl script to retrieve information from a locally installed reactome database. As previously, Reactome did not provide all the mappings we required.
However, it seems that that has changed over time, and I had not yet noticed that. (I don't use Reactome as often as I did previously)
I want to create a new version of reactome.db soon, which will be based on the mappings that they currently provide. http://reactome.org/download/current/NCBI2Reactome_All_Levels.txt
I already checked that file, and it does include the mapping you are looking for with Entrez Gene ID 3855.
Your issue makes it again quite clear that I need to do this sooner rather than later. I will start at least this evening, because nobody wants inconsistencies in this kind of information.

Entering edit mode

Many thanks for your fast response. Hope to hear news soon

Entering edit mode

Hi, can you please try the following package and see if it works?


Entering edit mode

I couldn't use reactome.db (it is not exported on the NAMESPACE, but maybe should it be generated somehow in zzz.R), neither the reactome_dbInfo() work but now it seems that genes are correctly mapped to the pathways:

> install.packages("reactome.db_1.0.59.tar.gz", repos = NULL)
Installing package into ‘/home/user/R/x86_64-pc-linux-gnu-library/3.3’
(as ‘lib’ is unspecified)

* installing *source* package ‘reactome.db’ ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (reactome.db)
> library(reactome.db)
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colnames, do.call,
    duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect,
    is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
    paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
    Reduce, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit, which, which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

    colMeans, colSums, expand.grid, rowMeans, rowSums
> select(reactome.db, key = "3855", keytype = "ENTREZID", columns = "REACTOMEID")
Error in select(reactome.db, key = "3855", keytype = "ENTREZID", columns = "REACTOMEID") :
  object 'reactome.db' not found

>  ls("package:reactome.db")
 [1] "reactome"              "reactome_dbconn"       "reactome_dbfile"      
 [4] "reactome_dbInfo"       "reactome_dbschema"     "reactomeEXTID2PATHID"
 [7] "reactomeGO2REACTOMEID" "reactomeMAPCOUNTS"     "reactomePATHID2EXTID"
[10] "reactomePATHID2NAME"   "reactomePATHNAME2ID"   "reactomeREACTOMEID2GO"
> reactome_dbInfo()
                name                                     value
1           DBSCHEMA                               REACTOME_DB
2    DBSCHEMAVERSION                                        59
3         SOURCENAME                                  Reactome
4          SOURCEURL http://www.reactome.org/download/current/
5         SOURCEDATE                                2017-02-27
6 Supporting package                             AnnotationDbi
7            Db type                                ReactomeDb
> reactome_dbschema()

Warning messages:
1: In file(con, "r") :
  file("") only supports open = "w+" and open = "w+b": using the former
2: In max(ii) : no non-missing arguments to max; returning -Inf
> xx <- as.list(reactomeEXTID2PATHID)
> xx[["3855"]]
[1] "1266738" "6805567" "6809371"

If you need more help testing it, I am glad to help.

Entering edit mode

Thank you for the feedback, I will look into those issues.

Entering edit mode

I noted some other changes on the new version which I don't know which is the cause, maybe the mapping or the updated of the data.

There are now less genes in reactomeEXTID2PATHID, from 65445 there were previously to 27956 genes, also from 17151 pathways(?) in the previous version now there are 1932.

In the reactomePATHNAME2ID now there are 21998 pathways but 9 of the pathways of reactomeEXTID2PATHID are not there and there are 255 pathways annotated in pathNAME2ID not annotated in EXTID2PATHID.


In the latest version :

                27956                  1791                  1932
  reactomePATHID2NAME   reactomePATHNAME2ID reactomeREACTOMEID2GO
                 2176                 21968                 10554


> reactomeMAPCOUNTS
                65445                  1796                 17151
  reactomePATHID2NAME   reactomePATHNAME2ID reactomeREACTOMEID2GO
                20800                 20767                 51734

Entering edit mode

Ah, yes, this was what I was afraid of. (I didn't have time to check it myself yesterday.
I switched from using a Perl script that did the mapping of genes to reaction/pathways, to the mappings provided on their website.

My first hunch would be that they limit their mappings to human genes, and I did not.
The only issue is, that my mappings were inconsistent with their mappings, which is annoying for the end user.
The mapping of genes to reactions is not trivial in Reactome, since their are many different objects used to model the various reactions.
Ah well, some more work for me to do then. :)


Login before adding your answer.

Traffic: 447 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6