Question

Error in .testForValidKeys "org.At.tair.db"

0

Entering edit mode

Zoe • 0

@fa85dcb7

Last seen 2.7 years ago

United Kingdom

Hi,

I've run into a problem with the select function in org.At.tair.db. I'm looping through a list of Arabidopsis TAIR gene IDs and retrieving annotation info. It works fine until I reach a certain point in my list [812] and then I get an error saying that the key is not valid. As far as I can tell there's nothing wrong with the gene ID. It is the same format as all the others and I looked it up in tair and got a result. I've put in my code below. I hope it's clear enough. The first section is not very relevant as I'm just matching P.patens gene names to Arabidopsis homologue IDs before retrieving the info from org.At.tair.db.

Anyone know why this is happening? Any help would be appreciated.


#problematic code
for (i in 1:gene_num_contrast1_sig){
  match_ID_sig <- Arabidopsis_ID[Arabidopsis_ID$V1 %like% paste(Pp_ID_contrast1_sig[i], "V3", sep = ""), ]        
  At_match_sig <- as.character(match_ID_sig[1,2])
  At_ID_contrast1_sig <- c(At_ID_contrast1_sig, strsplit(At_match_sig, "\\.")[[1]][1])
  if (is.na(At_ID_contrast1_sig[i])){
    ann_contrast1_sig<- c(ann_contrast1_sig, NA)
    }else{ 
    ann<- select(org.At.tair.db, keytype = "TAIR", keys = At_ID_contrast1_sig[i], columns = "GENENAME")
    ann_contrast1_sig<- c(ann_contrast1_sig, ann[1,2])}
}

#runs as expected until I get this error code when i=812
#Error in .testForValidKeys(x, keys, keytype, fks) : 
  #None of the keys entered are valid keys for 'TAIR'. Please use the keys method to see a listing of valid arguments.

#The gene ID at this index is "AT4G04605"


> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggridges_0.5.3        ggnewscale_0.4.5      DOSE_3.16.0           ggplot2_3.3.3        
 [5] enrichplot_1.10.2     data.table_1.14.0     stringr_1.4.0         org.At.tair.db_3.12.0
 [9] AnnotationDbi_1.52.0  IRanges_2.24.1        S4Vectors_0.28.1      Biobase_2.50.0       
[13] BiocGenerics_0.36.1  

loaded via a namespace (and not attached):
  [1] fgsea_1.16.0                colorspace_1.4-1            ellipsis_0.3.2             
  [4] qvalue_2.22.0               XVector_0.30.0              GenomicRanges_1.42.0       
  [7] rstudioapi_0.13             farver_2.1.0                graphlayouts_0.7.1         
 [10] ggrepel_0.9.1               bit64_4.0.5                 fansi_0.4.2                
 [13] scatterpie_0.1.6            xml2_1.3.2                  splines_4.0.2              
 [16] cachem_1.0.4                GOSemSim_2.16.1             geneplotter_1.68.0         
 [19] polyclip_1.10-0             Rsamtools_2.6.0             annotate_1.68.0            
 [22] GO.db_3.12.1                dbplyr_2.1.1                ggforce_0.3.3              
 [25] BiocManager_1.30.16         compiler_4.0.2              httr_1.4.2                 
 [28] rvcheck_0.1.8               assertthat_0.2.1            Matrix_1.2-18              
 [31] fastmap_1.1.0               cli_2.5.0                   tweenr_1.0.2               
 [34] prettyunits_1.1.1           tools_4.0.2                 igraph_1.2.6               
 [37] gtable_0.3.0                glue_1.4.2                  GenomeInfoDbData_1.2.4     
 [40] reshape2_1.4.4              DO.db_2.9                   dplyr_1.0.6                
 [43] rappdirs_0.3.3              fastmatch_1.1-0             Rcpp_1.0.6                 
 [46] vctrs_0.3.8                 Biostrings_2.58.0           rtracklayer_1.49.5         
 [49] ggraph_2.0.5                lifecycle_1.0.0             clusterProfiler_3.18.1     
 [52] XML_3.99-0.6                zlibbioc_1.36.0             MASS_7.3-51.6              
 [55] scales_1.1.1                tidygraph_1.2.0             hms_1.1.0                  
 [58] MatrixGenerics_1.2.1        SummarizedExperiment_1.20.0 RColorBrewer_1.1-2         
 [61] curl_4.3.1                  memoise_2.0.0               gridExtra_2.3              
 [64] downloader_0.4              biomaRt_2.46.3              stringi_1.5.3              
 [67] RSQLite_2.2.7               genefilter_1.72.1           GenomicFeatures_1.42.3     
 [70] BiocParallel_1.24.1         GenomeInfoDb_1.26.7         rlang_0.4.11               
 [73] pkgconfig_2.0.3             matrixStats_0.58.0          bitops_1.0-7               
 [76] lattice_0.20-41             purrr_0.3.4                 GenomicAlignments_1.26.0   
 [79] cowplot_1.1.1               shadowtext_0.0.8            bit_4.0.4                  
 [82] tidyselect_1.1.1            plyr_1.8.6                  magrittr_2.0.1             
 [85] DESeq2_1.30.1               R6_2.5.0                    generics_0.1.0             
 [88] DelayedArray_0.16.3         DBI_1.1.1                   withr_2.4.2                
 [91] pillar_1.6.1                survival_3.1-12             RCurl_1.98-1.3             
 [94] tibble_3.1.1                crayon_1.4.1                utf8_1.2.1                 
 [97] BiocFileCache_1.14.0        viridis_0.6.1               progress_1.2.2             
[100] locfit_1.5-9.4              grid_4.0.2                  blob_1.2.1                 
[103] digest_0.6.27               xtable_1.8-4                tidyr_1.1.3                
[106] openssl_1.4.4               munsell_0.5.0               viridisLite_0.4.0          
[109] askpass_1.1

org.At.tair.db • 1.2k views

ADD COMMENT • link updated 2.7 years ago by James W. MacDonald 65k • written 2.7 years ago by Zoe • 0

score 0 · Answer 1 · 2021-07-26

You should normally try to vectorize your methods rather than relying on for loops. While R is much faster at for loops than in the past, it's still way faster when you vectorize. In addition, the select function is fine with any vector that has at least one matching ID, whereas it is not fine with a vector that has none. So if you ask for things one at a time, you will often end up erroring out when you hit that one ID that doesn't map, but if you just used a vector you would get an NA back for that mis-matching ID.

As an example

> select(org.Hs.eg.db, as.character(1:15), "GENENAME")
'select()' returned 1:1 mapping between keys and columns
   ENTREZID                                GENENAME
1         1                  alpha-1-B glycoprotein
2         2                   alpha-2-macroglobulin
3         3      alpha-2-macroglobulin pseudogene 1
4         4                                    <NA>
5         5                                    <NA>
6         6                                    <NA>
7         7                                    <NA>
8         8                                    <NA>
9         9                   N-acetyltransferase 1
10       10                   N-acetyltransferase 2
11       11          N-acetyltransferase pseudogene
12       12                serpin family A member 3
13       13               arylacetamide deacetylase
14       14 angio associated migratory cell protein
15       15        aralkylamine N-acetyltransferase

## Versus

> for(i in 1:15) select(org.Hs.eg.db, as.character(i), "GENENAME")
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:1 mapping between keys and columns
Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.