Misleading information in reactome.db
1
0
Entering edit mode
@lluis-revilla-sancho
Last seen 4 days ago
European Union

Thanks for the reactome.db package and the effort to maintain it, I find it extremely useful and that's why I am using it so much.

In the website of Reactome for a gene I am interested in I can  see  it is part of R-HSA-6805546 a set of "Keratin type II, epithelial [cytosol]", which is part of "Keratin type II [cytosol]", which is part of a reaction .

However using reactome.db I couldn't find any REACTOMEID associated with it. (Making a minimal reproducible example I found a couple of bugs)

> library(reactome.db)
> select(reactome.db, key = "3855", keytype = "ENTREZID", columns = "REACTOMEID")
Error in .testForValidKeys(x, keys, keytype, fks) :
  None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.

> mapIds(reactome.db, key = "3855", keytype = "ENTREZID", column = "REACTOMEID")
Error in .testForValidKeys(x, keys, keytype, fks) :
  None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.

> keytypes(reactome.db)
[1] "ENTREZID"   "GO"         "PATHID"     "PATHNAME"   "REACTOMEID"

> library("org.Hs.eg.db")

> entrezids <- keys(org.Hs.eg.db, keytype = "ENTREZID")
> a <- select(reactome.db, key = entrezids, keytype = "ENTREZID", columns = "REACTOMEID")
'select()' returned 1:many mapping between keys and columns

> a[a$ENTREZID == "3855",  ]
      ENTREZID REACTOMEID
19433     3855       <NA>

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C             
[3] LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8   
[5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8  
[7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                
[9] LC_ADDRESS=C               LC_TELEPHONE=C           
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] org.Hs.eg.db_3.4.0   reactome.db_1.58.0   AnnotationDbi_1.36.2
[4] IRanges_2.8.1        S4Vectors_0.12.1     Biobase_2.34.0     
[7] BiocGenerics_0.20.0

loaded via a namespace (and not attached):
[1] DBI_0.5-1     memoise_1.0.0 Rcpp_0.12.9   RSQLite_1.1-2 digest_0.6.12

 

I know that the package has some difficulties (A: Unable to see reactome.db dbschema) but, how are the REACTOMEID obtained?

Could all genes be mapped to the reactions (if they are involved in any) even if a gene participates in a reaction through a complex or a set ?

In the database schema there are some tables describing sets and complex entities and reactions. This might be the way to annotate those to pathway. But I don't know if the data is in the package or not.

reactome.db • 2.1k views
ADD COMMENT
2
Entering edit mode
@willemligtenberg-6989
Last seen 6.4 years ago
Netherlands

As the maintainer of reactome.db, I am sorry for the inconsistencies.
Currently, the reactome.db package is generated using a Perl script to retrieve information from a locally installed reactome database. As previously, Reactome did not provide all the mappings we required.
However, it seems that that has changed over time, and I had not yet noticed that. (I don't use Reactome as often as I did previously)
I want to create a new version of reactome.db soon, which will be based on the mappings that they currently provide. http://reactome.org/download/current/NCBI2Reactome_All_Levels.txt
I already checked that file, and it does include the mapping you are looking for with Entrez Gene ID 3855.
Your issue makes it again quite clear that I need to do this sooner rather than later. I will start at least this evening, because nobody wants inconsistencies in this kind of information.

ADD COMMENT
0
Entering edit mode

Many thanks for your fast response. Hope to hear news soon

ADD REPLY
0
Entering edit mode

Hi, can you please try the following package and see if it works?

https://owncloud.wligtenberg.nl/index.php/s/zxb0t7hq2wt6BzM

ADD REPLY
0
Entering edit mode

I couldn't use reactome.db (it is not exported on the NAMESPACE, but maybe should it be generated somehow in zzz.R), neither the reactome_dbInfo() work but now it seems that genes are correctly mapped to the pathways:

> install.packages("reactome.db_1.0.59.tar.gz", repos = NULL)
Installing package into ‘/home/user/R/x86_64-pc-linux-gnu-library/3.3’
(as ‘lib’ is unspecified)

* installing *source* package ‘reactome.db’ ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (reactome.db)
>
> library(reactome.db)
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colnames, do.call,
    duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect,
    is.unsorted, lapply, lengths, Map, mapply, match, mget, order,
    paste, pmax, pmax.int, pmin, pmin.int, Position, rank, rbind,
    Reduce, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit, which, which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:base’:

    colMeans, colSums, expand.grid, rowMeans, rowSums
> select(reactome.db, key = "3855", keytype = "ENTREZID", columns = "REACTOMEID")
Error in select(reactome.db, key = "3855", keytype = "ENTREZID", columns = "REACTOMEID") :
  object 'reactome.db' not found

>  ls("package:reactome.db")
 [1] "reactome"              "reactome_dbconn"       "reactome_dbfile"      
 [4] "reactome_dbInfo"       "reactome_dbschema"     "reactomeEXTID2PATHID"
 [7] "reactomeGO2REACTOMEID" "reactomeMAPCOUNTS"     "reactomePATHID2EXTID"
[10] "reactomePATHID2NAME"   "reactomePATHNAME2ID"   "reactomeREACTOMEID2GO"
> reactome_dbInfo()
                name                                     value
1           DBSCHEMA                               REACTOME_DB
2    DBSCHEMAVERSION                                        59
3         SOURCENAME                                  Reactome
4          SOURCEURL http://www.reactome.org/download/current/
5         SOURCEDATE                                2017-02-27
6 Supporting package                             AnnotationDbi
7            Db type                                ReactomeDb
> reactome_dbschema()

Warning messages:
1: In file(con, "r") :
  file("") only supports open = "w+" and open = "w+b": using the former
2: In max(ii) : no non-missing arguments to max; returning -Inf
> xx <- as.list(reactomeEXTID2PATHID)
> xx[["3855"]]
[1] "1266738" "6805567" "6809371"

If you need more help testing it, I am glad to help.

ADD REPLY
1
Entering edit mode

Thank you for the feedback, I will look into those issues.

ADD REPLY
0
Entering edit mode

I noted some other changes on the new version which I don't know which is the cause, maybe the mapping or the updated of the data.

There are now less genes in reactomeEXTID2PATHID, from 65445 there were previously to 27956 genes, also from 17151 pathways(?) in the previous version now there are 1932.

In the reactomePATHNAME2ID now there are 21998 pathways but 9 of the pathways of reactomeEXTID2PATHID are not there and there are 255 pathways annotated in pathNAME2ID not annotated in EXTID2PATHID.

 

In the latest version :

reactomeEXTID2PATHID reactomeGO2REACTOMEID  reactomePATHID2EXTID
                27956                  1791                  1932
  reactomePATHID2NAME   reactomePATHNAME2ID reactomeREACTOMEID2GO
                 2176                 21968                 10554

Previously:

> reactomeMAPCOUNTS
 reactomeEXTID2PATHID reactomeGO2REACTOMEID  reactomePATHID2EXTID
                65445                  1796                 17151
  reactomePATHID2NAME   reactomePATHNAME2ID reactomeREACTOMEID2GO
                20800                 20767                 51734

ADD REPLY
0
Entering edit mode

Ah, yes, this was what I was afraid of. (I didn't have time to check it myself yesterday.
I switched from using a Perl script that did the mapping of genes to reaction/pathways, to the mappings provided on their website.

My first hunch would be that they limit their mappings to human genes, and I did not.
The only issue is, that my mappings were inconsistent with their mappings, which is annoying for the end user.
The mapping of genes to reactions is not trivial in Reactome, since their are many different objects used to model the various reactions.
Ah well, some more work for me to do then. :)

ADD REPLY

Login before adding your answer.

Traffic: 627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6