I am experiencing some problems with select, while testing for a new package:
library("reactome.db") genes.id <- as.character(c(52, 11342, 80895, 57654, 58493, 1164, 1163, 4150, 2130, 159)) select(reactome.db, keys = genes.id, keytype = "ENTREZID", columns = "REACTOMEID") 'select()' returned 1:many mapping between keys and columns ENTREZID REACTOMEID 1 52 <NA> 2 11342 <NA> # Some data as expected select(reactome.db, keys = genes.id[1:5], keytype = "ENTREZID", columns = "REACTOMEID") Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments. > select(reactome.db, keys = genes.id[1:7], keytype = "ENTREZID", columns = "REACTOMEID") 'select()' returned 1:many mapping between keys and columns ENTREZID REACTOMEID 1 52 <NA> 2 11342 <NA> # Some data as expected > select(reactome.db, keys = genes.id[1:6], keytype = "ENTREZID", columns = "REACTOMEID") Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments. > select(reactome.db, keys = genes.id[2:7], keytype = "ENTREZID", columns = "REACTOMEID") 'select()' returned 1:many mapping between keys and columns ENTREZID REACTOMEID 1 11342 <NA> 2 80895 <NA> 3 57654 <NA> # Some data > select(reactome.db, keys = genes.id[3:6], keytype = "ENTREZID", columns = "REACTOMEID") Error in .testForValidKeys(x, keys, keytype, fks) : None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments. > traceback() 7: stop(msg) 6: .testForValidKeys(x, keys, keytype, fks) 5: testSelectArgs(x, keys = keys, cols = cols, keytype = keytype) 4: .selectReact(x, keys, columns, keytype) 3: .selectWarnReact(x, keys, columns, keytype, kt = kt, ...) 2: select(reactome.db, keys = genes.id[3:6], keytype = "ENTREZID", columns = "REACTOMEID") 1: select(reactome.db, keys = genes.id[3:6], keytype = "ENTREZID", columns = "REACTOMEID")
>sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.5 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=es_ES.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] reactome.db_1.58.0 AnnotationDbi_1.36.0 IRanges_2.8.1 [4] S4Vectors_0.12.0 Biobase_2.34.0 BiocGenerics_0.20.0 loaded via a namespace (and not attached): [1] DBI_0.5-1 memoise_1.0.0 Rcpp_0.12.8 RSQLite_1.1-1 digest_0.6.10
Is this the expected behavior?
Indeed, some genes don't have a pathway associated with them. But select fails for some of them not returning NA, but when I use the whole set it return NAs. Shouldn't always return NA, when I `select` some and when I use `select` with all of them?
There are two competing ideas here. You are saying that you put in real official Entrez Gene IDs, and so shouldn't reactome.db then return a data.frame with NA values?
An alternative viewpoint is to say that unless some of the IDs you have passed in appear to be valid Entrez Gene IDs, a more useful thing to do would be to tell you that none of the IDs appear to be valid Entrez Gene IDs, which is what happens. If we were to do what you suggest, then you could do something like
I suppose there are valid arguments for either behavior, but to me they boil down to two general ideas:
I tend towards #2, personally, because people do make mistakes and it is helpful to let them know when it appears they have done so.
I understand what you mean here, but just to clarify, neither mget, nor the select function is defined in an annotation package. This behaviour comes from AnnotationDbi::select and from BiocGenerics::mget.