TxDb error: external pointer is not valid
2
0
Entering edit mode
@o-william-mcclung-22004
Last seen 4.5 years ago
United States

On my box within R I attempted to create a TxDb object from a GFF file with

human_txdb <- makeTxDbFromGFF(file="human.gff",format="gff3")

and this crashed the R process. (Presumably an out-of-memory error.) Here's my environment:

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2                  pillar_1.4.2               
 [3] compiler_3.6.1              GenomeInfoDb_1.20.0        
 [5] XVector_0.24.0              GenomicFeatures_1.36.4     
 [7] prettyunits_1.0.2           bitops_1.0-6               
 [9] tools_3.6.1                 zlibbioc_1.30.0            
[11] progress_1.2.2              zeallot_0.1.0              
[13] biomaRt_2.40.4              digest_0.6.20              
[15] bit_1.1-14                  lattice_0.20-38            
[17] RSQLite_2.1.2               memoise_1.1.0              
[19] tibble_2.1.3                pkgconfig_2.0.2            
[21] rlang_0.4.0                 Matrix_1.2-17              
[23] DelayedArray_0.10.0         DBI_1.0.0                  
[25] parallel_3.6.1              GenomeInfoDbData_1.2.1     
[27] rtracklayer_1.44.4          httr_1.4.1                 
[29] stringr_1.4.0               Biostrings_2.52.0          
[31] S4Vectors_0.22.1            vctrs_0.2.0                
[33] IRanges_2.18.2              hms_0.5.1                  
[35] grid_3.6.1                  stats4_3.6.1               
[37] bit64_0.9-7                 Biobase_2.44.0             
[39] R6_2.4.0                    AnnotationDbi_1.46.1       
[41] BiocParallel_1.18.1         XML_3.98-1.20              
[43] blob_1.2.0                  magrittr_1.5               
[45] matrixStats_0.55.0          GenomicAlignments_1.20.1   
[47] Rsamtools_2.0.0             backports_1.1.4            
[49] BiocGenerics_0.30.0         GenomicRanges_1.36.1       
[51] SummarizedExperiment_1.14.1 assertthat_0.2.1           
[53] stringi_1.4.3               RCurl_1.95-4.12            
[55] crayon_1.3.4               

The box has 8 GB of memory and 8 cores. human.gff is 1.1 GB.

I then tried the same thing on an HPC cluster whose nodes have 16 cores and 64 GB of memory. This succeeded and within R I saved the result as an R object:

saveRDS(human_txdb,file = "human_txdb.rds")

I then scp'd human_txdb.rds to my box and within R executed

human_txdb <- readRDS(file="human_txdb.rds")

This succeeded. However

human_txdb

failed with

TxDb object:
Error: external pointer is not valid

As an experiment, I went back to the HPC cluster and changed the R script :

human_txdb <- makeTxDbFromGFF(file="human.gff",format="gff3")
saveRDS(human_txdb,file = "human_txdb.rds")
human_txdb <- readRDS(file = "human_txdb.rds")
human_txdb

This created a more explicit error:

Error in result_create(conn@ptr, statement) : 
  external pointer is not valid
Calls: <Anonymous> ... .local -> new -> initialize -> initialize -> result_create
Execution halted

Thus the problem seems not to be related to the scp. Here's the R environment on the HPC cluster node:

R version 3.5.1 (2018-07-02)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Scientific Linux release 6.10 (Carbon)

Matrix products: default
BLAS/LAPACK: /util/opt/anaconda/deployed-conda-envs/packages/bioconductor/envs/bioconductor-3.8/lib/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] systemPipeR_1.16.0          ShortRead_1.40.0           
 [3] GenomicAlignments_1.18.0    SummarizedExperiment_1.12.0
 [5] DelayedArray_0.8.0          matrixStats_0.54.0         
 [7] BiocParallel_1.16.2         Rsamtools_1.34.0           
 [9] Biostrings_2.50.1           XVector_0.22.0             
[11] GenomicFeatures_1.34.1      AnnotationDbi_1.44.0       
[13] Biobase_2.42.0              GenomicRanges_1.34.0       
[15] GenomeInfoDb_1.18.1         IRanges_2.16.0             
[17] S4Vectors_0.20.1            BiocGenerics_0.28.0        

loaded via a namespace (and not attached):
 [1] httr_1.4.0             edgeR_3.24.1           bit64_0.9-7           
 [4] splines_3.5.1          assertthat_0.2.0       latticeExtra_0.6-28   
 [7] RBGL_1.58.1            blob_1.1.1             GenomeInfoDbData_1.2.0
[10] Category_2.48.0        progress_1.2.2         backports_1.1.3       
[13] pillar_1.4.2           RSQLite_2.1.1          lattice_0.20-38       
[16] glue_1.3.0             limma_3.38.3           digest_0.6.18         
[19] checkmate_1.8.5        RColorBrewer_1.1-2     colorspace_1.4-0      
[22] Matrix_1.2-15          plyr_1.8.4             GSEABase_1.44.0       
[25] XML_3.98-1.16          pkgconfig_2.0.2        pheatmap_1.0.12       
[28] biomaRt_2.38.0         genefilter_1.64.0      zlibbioc_1.28.0       
[31] GO.db_3.7.0            purrr_0.3.2            xtable_1.8-3          
[34] scales_1.0.0           brew_1.0-6             tibble_2.0.1          
[37] annotate_1.60.0        ggplot2_3.1.0          lazyeval_0.2.1        
[40] survival_2.43-3        magrittr_1.5           crayon_1.3.4          
[43] memoise_1.1.0          hwriter_1.3.2          GOstats_2.48.0        
[46] graph_1.60.0           data.table_1.12.0      tools_3.5.1           
[49] prettyunits_1.0.2      hms_0.4.2              BBmisc_1.11           
[52] stringr_1.4.0          sendmailR_1.2-1        munsell_0.5.0         
[55] locfit_1.5-9.1         compiler_3.5.1         rlang_0.3.4           
[58] grid_3.5.1             RCurl_1.95-4.11        rjson_0.2.20          
[61] AnnotationForge_1.24.0 base64enc_0.1-3        bitops_1.0-6          
[64] gtable_0.2.0           DBI_1.0.0              R6_2.4.0              
[67] dplyr_0.8.0.1          rtracklayer_1.42.1     bit_1.1-14            
[70] Rgraphviz_2.26.0       stringi_1.2.4          BatchJobs_1.7         
[73] Rcpp_1.0.0             tidyselect_0.2.5

Any debugging hints will be gratefully received.

software error GenomicFeatures • 3.0k views
ADD COMMENT
0
Entering edit mode

I wanted to present an alternative based on the TxDb that I used since I was storing other information as well. for my example I was storing the genome as DNA string set and the GFF as well the TxDb for tracks information.

genome = redDNAStringSet("genome.fna") gff = readGFF("genome.gff") txdb = makeTxDbFromGFF("genome.gff", format="gff3") rds_data = list()

save everything in a list - as lists will put out contents of the TxDb not use a pointer

rds_data[[1]] = list(genome, gff, as.list(txdb)) saveRDS(rds_data, "fileName.RDS")

read back in the RDS you already saved

dat = readRDS("fileName.RDS") newTxDb = do.call(makeTxDb, dat[[1]][[3]])

This can be found in the methods section of the TxDB class functions. This way you can save multiple types at once, but is a little more complicated than just saveDb / loadDb

ADD REPLY
4
Entering edit mode
@martin-morgan-1513
Last seen 6 weeks ago
United States

Use saveDb() / loadDb() to save TxDb objects. The object is a reference to a sqlite data base. saveDb() saves the database. loadDb() loads a reference to the database and 'wraps' it in a new TxDb object.

ADD COMMENT
0
Entering edit mode
@o-william-mcclung-22004
Last seen 4.5 years ago
United States

Thanks go to Martin Morgan for providing the correct answer. This works:

human_txdb <- makeTxDbFromGFF(file = "human.gff",format="gff3")
saveDb(human_txdb,file = "human_txdb.sqlite")    # export
human_txdb <- loadDb(file = "human_txdb.sqlite") # import
ADD COMMENT

Login before adding your answer.

Traffic: 846 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6