Problems with Organism.dplyr
1
1
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

Using devel:

> src <- src_organism("TxDb.Hsapiens.UCSC.hg38.knownGene")
Using temporary cache /data3/tmp/RtmpL4Yacm/BiocFileCache
adding rname 'dplyr.TxDb.Hsapiens.UCSC.hg38.knownGene.sqlite'
Error in bfcrpath(bfc, rnames = txdb_name) :
  not all 'rnames' found or unique.
> traceback()
6: stop(e)
5: value[[3L]](cond)
4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
3: tryCatchList(expr, classes, parentenv, handlers)
2: tryCatch({
       suppressWarnings({
           bfcrpath(bfc, rnames = txdb_name)
       })
   }, error = function(e) {
       test <- identical(conditionMessage(e), "all 'rnames' not found or valid.")
       if (!test)
           stop(e)
       bfcnew(bfc, txdb_name)
   })
1: src_organism("TxDb.Hsapiens.UCSC.hg38.knownGene")
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS: /data/oldR/R-devel/lib64/R/lib/libRblas.so
LAPACK: /data/oldR/R-devel/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0
 [2] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [3] GenomicFeatures_1.33.0                 
 [4] AnnotationDbi_1.43.1                   
 [5] Biobase_2.41.2                         
 [6] GenomicRanges_1.33.13                  
 [7] GenomeInfoDb_1.17.1                    
 [8] IRanges_2.15.16                        
 [9] S4Vectors_0.19.19                      
[10] BiocGenerics_0.27.1                    
[11] bindrcpp_0.2.2                         
[12] Organism.dplyr_1.9.0                   
[13] AnnotationFilter_1.5.2                 
[14] dplyr_0.7.6                            

loaded via a namespace (and not attached):
 [1] SummarizedExperiment_1.11.6 progress_1.2.0             
 [3] tidyselect_0.2.4            purrr_0.2.5                
 [5] lattice_0.20-35             BiocFileCache_1.5.5        
 [7] rtracklayer_1.41.3          blob_1.1.1                 
 [9] XML_3.98-1.13               rlang_0.2.1                
[11] pillar_1.3.0                glue_1.3.0                 
[13] DBI_1.0.0                   BiocParallel_1.15.8        
[15] rappdirs_0.3.1              bit64_0.9-7                
[17] dbplyr_1.2.2                matrixStats_0.54.0         
[19] GenomeInfoDbData_1.1.0      bindr_0.1.1                
[21] stringr_1.3.1               zlibbioc_1.27.0            
[23] Biostrings_2.49.1           memoise_1.1.0              
[25] biomaRt_2.37.4              Rcpp_0.12.18               
[27] DelayedArray_0.7.24         org.Hs.eg.db_3.6.0         
[29] XVector_0.21.3              bit_1.1-14                 
[31] Rsamtools_1.33.3            hms_0.4.2                  
[33] digest_0.6.15               stringi_1.2.4              
[35] grid_3.5.0                  tools_3.5.0                
[37] bitops_1.0-6                magrittr_1.5               
[39] lazyeval_0.2.1              RCurl_1.95-4.11            
[41] tibble_1.4.2                RSQLite_2.1.1              
[43] crayon_1.3.4                pkgconfig_2.0.1            
[45] Matrix_1.2-14               prettyunits_1.0.2          
[47] assertthat_0.2.0            httr_1.3.1                 
[49] R6_2.2.2                    GenomicAlignments_1.17.3   
[51] compiler_3.5.0            

 

And using release:

> src <- src_organism("TxDb.Hsapiens.UCSC.hg38.knownGene")

> genes(src)
Error: Can't convert a NULL to a quosure

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

Matrix products: default
BLAS: /data/oldR/R-3.5.0/lib64/R/lib/libRblas.so
LAPACK: /data/oldR/R-3.5.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] bindrcpp_0.2.2                         
 [2] TxDb.Hsapiens.UCSC.hg38.knownGene_3.4.0
 [3] GenomicFeatures_1.32.0                 
 [4] AnnotationDbi_1.42.1                   
 [5] Biobase_2.40.0                         
 [6] GenomicRanges_1.32.3                   
 [7] GenomeInfoDb_1.16.0                    
 [8] IRanges_2.14.10                        
 [9] S4Vectors_0.18.3                       
[10] BiocGenerics_0.26.0                    
[11] Organism.dplyr_1.8.0                   
[12] AnnotationFilter_1.4.0                 
[13] dplyr_0.7.6                            
[14] BiocInstaller_1.30.0                   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17                lattice_0.20-35            
 [3] circlize_0.4.4              prettyunits_1.0.2          
 [5] Rsamtools_1.32.2            Biostrings_2.48.0          
 [7] assertthat_0.2.0            digest_0.6.15              
 [9] BiocFileCache_1.4.0         R6_2.2.2                   
[11] RSQLite_2.1.1               httr_1.3.1                 
[13] pillar_1.2.3                GlobalOptions_0.1.0        
[15] zlibbioc_1.26.0             rlang_0.2.1                
[17] progress_1.2.0              lazyeval_0.2.1             
[19] blob_1.1.1                  GetoptLong_0.1.7           
[21] Matrix_1.2-14               BiocParallel_1.14.2        
[23] stringr_1.3.1               RCurl_1.95-4.10            
[25] bit_1.1-14                  biomaRt_2.36.1             
[27] DelayedArray_0.6.1          compiler_3.5.0             
[29] rtracklayer_1.40.3          pkgconfig_2.0.1            
[31] shape_1.4.4                 tidyselect_0.2.4           
[33] SummarizedExperiment_1.10.1 tibble_1.4.2               
[35] GenomeInfoDbData_1.1.0      matrixStats_0.53.1         
[37] XML_3.98-1.11               crayon_1.3.4               
[39] dbplyr_1.2.1                GenomicAlignments_1.16.0   
[41] bitops_1.0-6                rappdirs_0.3.1             
[43] grid_3.5.0                  DBI_1.0.0                  
[45] magrittr_1.5                stringi_1.2.3              
[47] XVector_0.20.0              org.Hs.eg.db_3.6.0         
[49] rjson_0.2.20                RColorBrewer_1.1-2         
[51] tools_3.5.0                 bit64_0.9-7                
[53] glue_1.2.0                  purrr_0.2.5                
[55] hms_0.4.2                   colorspace_1.3-2           
[57] ComplexHeatmap_1.18.1       memoise_1.1.0              
[59] bindr_0.1.1                
>

 

organism.dplyr • 810 views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 6 weeks ago
United States

Thanks Jim

For your devel scenario, it's likely that you have duplicate records for the synthetic db, as for mm10 below

> library(BiocFileCache); library(tidyverse)
> bfc = BiocFileCache()
> bfcquery(bfc, "dplyr.") %>% select(rid, rname, fpath)
# A tibble: 4 x 3
  rid   rname                                           fpath       
  <chr> <chr>                                           <chr>       
1 BFC21 dplyr.TxDb.Hsapiens.UCSC.hg38.knownGene.sqlite  5b377bf41425
2 BFC22 dplyr.TxDb.Hsapiens.UCSC.hg19.knownGene.sqlite  ae07e7aa933 
3 BFC30 dplyr.TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite 512746ee2cfb
4 BFC74 dplyr.TxDb.Mmusculus.UCSC.mm10.knownGene.sqlite 65c942e3f1  

You need to clean this up by hand, perhaps removing one or both, perhaps consulting bfccache(bfc) and the 'fpath' for the corresponding files

> bfcremove(bfc, c("BFC30", "BFC74"))

The release scenario will also affect devel; it's a more stringent condition in underlying dplyr code. This has been corrected, along with better reporting of the duplicate record error and ability to control which BiocFileCache is used, in Organism.dplyr v.1.9.2 in devel; it will be ported to release shortly as 1.8.1

 

ADD COMMENT
0
Entering edit mode

Huh. It seems a restart was all I needed for the devel issue. Sorry for the noise.

> library(BiocFileCache)
Loading required package: dbplyr
       
> bfc <- BiocFileCache()
## got no tidyverse, let's wrap in as.data.frame() ;P
> as.data.frame(bfcquery(bfc, "dplyr"))[,c("rid","rname","fpath")]
   rid                                          rname        fpath
1 BFC1 dplyr.TxDb.Hsapiens.UCSC.hg38.knownGene.sqlite 796150c6fd7c
> library(Organism.dplyr)

> src <- src_organism("TxDb.Hsapiens.UCSC.hg38.knownGene")

 

ADD REPLY
0
Entering edit mode

Thank you for posting about this issue.  Any criticism about the package is helpful.  It allowed Martin and I to discuss and make a few small changes that needed to be made.

ADD REPLY

Login before adding your answer.

Traffic: 843 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6