Search
Question: Problems with Organism.dplyr
1
gravatar for James W. MacDonald
6 days ago by
United States
James W. MacDonald47k wrote:

Using devel:

> src <- src_organism("TxDb.Hsapiens.UCSC.hg38.knownGene")
Using temporary cache /data3/tmp/RtmpL4Yacm/BiocFileCache
adding rname 'dplyr.TxDb.Hsapiens.UCSC.hg38.knownGene.sqlite'
Error in bfcrpath(bfc, rnames = txdb_name) :
  not all 'rnames' found or unique.
> traceback()
6: stop(e)
5: value[[3L]](cond)
4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
3: tryCatchList(expr, classes, parentenv, handlers)
2: tryCatch({
       suppressWarnings({
           bfcrpath(bfc, rnames = txdb_name)
       })
   }, error = function(e) {
       test <- identical(conditionMessage(e), "all 'rnames' not found or valid.")
       if (!test)
           stop(e)
       bfcnew(bfc, txdb_name)
   })
1: src_organism("TxDb.Hsapiens.UCSC.hg38.knownGene")
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS: /data/oldR/R-devel/lib64/R/lib/libRblas.so
LAPACK: /data/oldR/R-devel/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0
 [2] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [3] GenomicFeatures_1.33.0                 
 [4] AnnotationDbi_1.43.1                   
 [5] Biobase_2.41.2                         
 [6] GenomicRanges_1.33.13                  
 [7] GenomeInfoDb_1.17.1                    
 [8] IRanges_2.15.16                        
 [9] S4Vectors_0.19.19                      
[10] BiocGenerics_0.27.1                    
[11] bindrcpp_0.2.2                         
[12] Organism.dplyr_1.9.0                   
[13] AnnotationFilter_1.5.2                 
[14] dplyr_0.7.6                            

loaded via a namespace (and not attached):
 [1] SummarizedExperiment_1.11.6 progress_1.2.0             
 [3] tidyselect_0.2.4            purrr_0.2.5                
 [5] lattice_0.20-35             BiocFileCache_1.5.5        
 [7] rtracklayer_1.41.3          blob_1.1.1                 
 [9] XML_3.98-1.13               rlang_0.2.1                
[11] pillar_1.3.0                glue_1.3.0                 
[13] DBI_1.0.0                   BiocParallel_1.15.8        
[15] rappdirs_0.3.1              bit64_0.9-7                
[17] dbplyr_1.2.2                matrixStats_0.54.0         
[19] GenomeInfoDbData_1.1.0      bindr_0.1.1                
[21] stringr_1.3.1               zlibbioc_1.27.0            
[23] Biostrings_2.49.1           memoise_1.1.0              
[25] biomaRt_2.37.4              Rcpp_0.12.18               
[27] DelayedArray_0.7.24         org.Hs.eg.db_3.6.0         
[29] XVector_0.21.3              bit_1.1-14                 
[31] Rsamtools_1.33.3            hms_0.4.2                  
[33] digest_0.6.15               stringi_1.2.4              
[35] grid_3.5.0                  tools_3.5.0                
[37] bitops_1.0-6                magrittr_1.5               
[39] lazyeval_0.2.1              RCurl_1.95-4.11            
[41] tibble_1.4.2                RSQLite_2.1.1              
[43] crayon_1.3.4                pkgconfig_2.0.1            
[45] Matrix_1.2-14               prettyunits_1.0.2          
[47] assertthat_0.2.0            httr_1.3.1                 
[49] R6_2.2.2                    GenomicAlignments_1.17.3   
[51] compiler_3.5.0            

 

And using release:

> src <- src_organism("TxDb.Hsapiens.UCSC.hg38.knownGene")

> genes(src)
Error: Can't convert a NULL to a quosure

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS: /data/oldR/R-3.5.0/lib64/R/lib/libRblas.so
LAPACK: /data/oldR/R-3.5.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] bindrcpp_0.2.2                         
 [2] TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0
 [3] GenomicFeatures_1.32.0                 
 [4] AnnotationDbi_1.42.1                   
 [5] Biobase_2.40.0                         
 [6] GenomicRanges_1.32.3                   
 [7] GenomeInfoDb_1.16.0                    
 [8] IRanges_2.14.10                        
 [9] S4Vectors_0.18.3                       
[10] BiocGenerics_0.26.0                    
[11] Organism.dplyr_1.8.0                   
[12] AnnotationFilter_1.4.0                 
[13] dplyr_0.7.6                            
[14] BiocInstaller_1.30.0                   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17                lattice_0.20-35            
 [3] circlize_0.4.4              prettyunits_1.0.2          
 [5] Rsamtools_1.32.2            Biostrings_2.48.0          
 [7] assertthat_0.2.0            digest_0.6.15              
 [9] BiocFileCache_1.4.0         R6_2.2.2                   
[11] RSQLite_2.1.1               httr_1.3.1                 
[13] pillar_1.2.3                GlobalOptions_0.1.0        
[15] zlibbioc_1.26.0             rlang_0.2.1                
[17] progress_1.2.0              lazyeval_0.2.1             
[19] blob_1.1.1                  GetoptLong_0.1.7           
[21] Matrix_1.2-14               BiocParallel_1.14.2        
[23] stringr_1.3.1               RCurl_1.95-4.10            
[25] bit_1.1-14                  biomaRt_2.36.1             
[27] DelayedArray_0.6.1          compiler_3.5.0             
[29] rtracklayer_1.40.3          pkgconfig_2.0.1            
[31] shape_1.4.4                 tidyselect_0.2.4           
[33] SummarizedExperiment_1.10.1 tibble_1.4.2               
[35] GenomeInfoDbData_1.1.0      matrixStats_0.53.1         
[37] XML_3.98-1.11               crayon_1.3.4               
[39] dbplyr_1.2.1                GenomicAlignments_1.16.0   
[41] bitops_1.0-6                rappdirs_0.3.1             
[43] grid_3.5.0                  DBI_1.0.0                  
[45] magrittr_1.5                stringi_1.2.3              
[47] XVector_0.20.0              org.Hs.eg.db_3.6.0         
[49] rjson_0.2.20                RColorBrewer_1.1-2         
[51] tools_3.5.0                 bit64_0.9-7                
[53] glue_1.2.0                  purrr_0.2.5                
[55] hms_0.4.2                   colorspace_1.3-2           
[57] ComplexHeatmap_1.18.1       memoise_1.1.0              
[59] bindr_0.1.1                
>

 

ADD COMMENTlink modified 6 days ago by Martin Morgan ♦♦ 22k • written 6 days ago by James W. MacDonald47k
0
gravatar for Martin Morgan
6 days ago by
Martin Morgan ♦♦ 22k
United States
Martin Morgan ♦♦ 22k wrote:

Thanks Jim

For your devel scenario, it's likely that you have duplicate records for the synthetic db, as for mm10 below

> library(BiocFileCache); library(tidyverse)
> bfc = BiocFileCache()
> bfcquery(bfc, "dplyr.") %>% select(rid, rname, fpath)
# A tibble: 4 x 3
  rid   rname                                           fpath       
  <chr> <chr>                                           <chr>       
1 BFC21 dplyr.TxDb.Hsapiens.UCSC.hg38.knownGene.sqlite  5b377bf41425
2 BFC22 dplyr.TxDb.Hsapiens.UCSC.hg19.knownGene.sqlite  ae07e7aa933 
3 BFC30 dplyr.TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite 512746ee2cfb
4 BFC74 dplyr.TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite 65c942e3f1  

You need to clean this up by hand, perhaps removing one or both, perhaps consulting bfccache(bfc) and the 'fpath' for the corresponding files

> bfcremove(bfc, c("BFC30", "BFC74"))

The release scenario will also affect devel; it's a more stringent condition in underlying dplyr code. This has been corrected, along with better reporting of the duplicate record error and ability to control which BiocFileCache is used, in Organism.dplyr v.1.9.2 in devel; it will be ported to release shortly as 1.8.1

 

ADD COMMENTlink modified 6 days ago • written 6 days ago by Martin Morgan ♦♦ 22k

Huh. It seems a restart was all I needed for the devel issue. Sorry for the noise.

> library(BiocFileCache)
Loading required package: dbplyr
       
> bfc <- BiocFileCache()
## got no tidyverse, let's wrap in as.data.frame() ;P
> as.data.frame(bfcquery(bfc, "dplyr"))[,c("rid","rname","fpath")]
   rid                                          rname        fpath
1 BFC1 dplyr.TxDb.Hsapiens.UCSC.hg38.knownGene.sqlite 796150c6fd7c
> library(Organism.dplyr)

> src <- src_organism("TxDb.Hsapiens.UCSC.hg38.knownGene")

 

ADD REPLYlink written 6 days ago by James W. MacDonald47k

Thank you for posting about this issue.  Any criticism about the package is helpful.  It allowed Martin and I to discuss and make a few small changes that needed to be made.

ADD REPLYlink modified 6 days ago • written 6 days ago by daniel.vantwisk30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 322 users visited in the last hour