Selecting few ids from reactome.db
2
1
Entering edit mode
@lluis-revilla-sancho
Last seen 6 days ago
European Union

I am experiencing some problems with select, while testing for a new package:

library("reactome.db")

genes.id <- as.character(c(52, 11342, 80895, 57654, 58493, 1164, 1163, 4150,  2130, 159))

select(reactome.db, keys = genes.id, keytype = "ENTREZID", columns = "REACTOMEID")

'select()' returned 1:many mapping between keys and columns
   ENTREZID REACTOMEID
1        52       <NA>
2     11342       <NA>

# Some data as expected

select(reactome.db, keys = genes.id[1:5], keytype = "ENTREZID", columns = "REACTOMEID")
Error in .testForValidKeys(x, keys, keytype, fks) :
  None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.
> select(reactome.db, keys = genes.id[1:7], keytype = "ENTREZID", columns = "REACTOMEID")

'select()' returned 1:many mapping between keys and columns
   ENTREZID REACTOMEID
1        52       <NA>
2     11342       <NA>

# Some data as expected

> select(reactome.db, keys = genes.id[1:6], keytype = "ENTREZID", columns = "REACTOMEID")
Error in .testForValidKeys(x, keys, keytype, fks) :
  None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.
> select(reactome.db, keys = genes.id[2:7], keytype = "ENTREZID", columns = "REACTOMEID")
'select()' returned 1:many mapping between keys and columns
   ENTREZID REACTOMEID
1     11342       <NA>
2     80895       <NA>
3     57654       <NA> # Some data

> select(reactome.db, keys = genes.id[3:6], keytype = "ENTREZID", columns = "REACTOMEID")
Error in .testForValidKeys(x, keys, keytype, fks) :
  None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.
> traceback()
7: stop(msg)
6: .testForValidKeys(x, keys, keytype, fks)
5: testSelectArgs(x, keys = keys, cols = cols, keytype = keytype)
4: .selectReact(x, keys, columns, keytype)
3: .selectWarnReact(x, keys, columns, keytype, kt = kt, ...)
2: select(reactome.db, keys = genes.id[3:6], keytype = "ENTREZID",
       columns = "REACTOMEID")
1: select(reactome.db, keys = genes.id[3:6], keytype = "ENTREZID",
       columns = "REACTOMEID")
>sessionInfo()

R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] reactome.db_1.58.0   AnnotationDbi_1.36.0 IRanges_2.8.1       
[4] S4Vectors_0.12.0     Biobase_2.34.0       BiocGenerics_0.20.0

loaded via a namespace (and not attached):
[1] DBI_0.5-1     memoise_1.0.0 Rcpp_0.12.8   RSQLite_1.1-1 digest_0.6.10

Is this the expected behavior?

annotationdbi reactome.db bug • 1.7k views
ADD COMMENT
1
Entering edit mode
@willemligtenberg-6989
Last seen 6.4 years ago
Netherlands

I would say this is indeed expected behaviour.

A couple of the gene ids that you list there have apparently no pathway associated with them. (At least not that ends up in any of my queries that I use to generate this package)

If that in itself is an error, could you please show me, where you have found a connection for that specific Entrez gene id, to a specific pathway in Reactome. Then I can review my queries and adapt if required.

I myself use mget normally in this case, and there you can specify what it should do if no mapping was found:

mget(x = genes.id, reactomeEXTID2PATHID, ifnotfound = NA)
ADD COMMENT
0
Entering edit mode

Indeed, some genes don't have a pathway associated with them. But select fails for some of them not returning NA, but when I use the whole set it return NAs. Shouldn't always return NA, when I `select` some and when I use `select` with all of them?

ADD REPLY
0
Entering edit mode

There are two competing ideas here. You are saying that you put in real official Entrez Gene IDs, and so shouldn't reactome.db then return a data.frame with NA values?

An alternative viewpoint is to say that unless some of the IDs you have passed in appear to be valid Entrez Gene IDs, a more useful thing to do would be to tell you that none of the IDs appear to be valid Entrez Gene IDs, which is what happens. If we were to do what you suggest, then you could do something like

egids <- c("not", "really", "entrez","gene","IDs","at","all")
select(reactome.db, egids, "REACTOMEID","ENTREZID")
  ENTREZID   REACTOMEID
1   not           <NA>
2   really        <NA>
3   entrez        <NA>
4   gene          <NA>
5   IDs           <NA>
6   at            <NA>
7   all           <NA>

I suppose there are valid arguments for either behavior, but to me they boil down to two general ideas:

  1. We are all grownups here, and so the package should just process input data without too much error checking. If people want to query for random things, they should get NA values back.
  2. Not everybody is fully cognizant of how these annotation db packages work, and it might be helpful to have some error checking. If somebody inputs a set of keys for which there are no matches, that may well be an error on their part, and it would be helpful to let them know.

I tend towards #2, personally, because people do make mistakes and it is helpful to let them know when it appears they have done so.

ADD REPLY
1
Entering edit mode

I understand what you mean here, but just to clarify, neither mget, nor the select function is defined in an annotation package. This behaviour comes from AnnotationDbi::select and from BiocGenerics::mget.

ADD REPLY
0
Entering edit mode
@martin-morgan-1513
Last seen 6 weeks ago
United States

Although all the ids may be valid ENTREZ ids, they do not appear to be in the version of reactome.db available in the package

> genes.id %in% keys(reactome.db, "ENTREZID")
 [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE

filter the ids first

gene.ids <- gene.ids[gene.ids %in% keys(reactome.db, "ENTREZID")

and then query

select(reactome.db, keys = genes.id, keytype = "ENTREZID", columns = "REACTOMEID")
ADD COMMENT

Login before adding your answer.

Traffic: 969 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6