Failure to save txDb using saveDb in BiocFileCache location other than /tmp
1
0
Entering edit mode
@3f155d9f
Last seen 3.6 years ago
Netherlands

Hi,

I have an issue with creating a txmeta transcriptome with BiocFileManager (R 4.0.1, tximeta 1.8.4 and tximeta development checkout on Ubuntu 20.04 LTS with SQLite 3.3). The cache file is touched, but not filled. An rds file referencing the gtf file of the transcriptome is created. Together with Mike Love we narrowed it down to the saveDb portion of creating the transcriptome ( tximeta github issue 56 ) I tried multiple mounts on our filesystem. The only location that does work is /tmp, which makes it really weird. All target locations are user read and writable. Also a small test to write a db to the cache location works. The generated error/warning messages seem to indicate some sort of database locking issue. The SQLite error message of "not an error" is obviously not very helpful. Has anybody encountered this before? Any help on resolving this issue would be appreciated.

Regards, Judith

Writing to file system location

> bfc <- BiocFileCache::BiocFileCache("/scratch/pmaj_index_cache")
> txdb <- makeTxDbFromGFF(file=gtfPath, dataSource="EnsemblDbv97", organism="Parus major", chrominfo=chromInd)
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
> loc <- BiocFileCache::bfcnew(bfc, rname="testing", ext=".sqlite")
> saveDb(txdb, file=loc)
Error: Failed to copy all data:
not an error
In addition: Warning message:
Couldn't set synchronous mode: database is locked
Use `synchronous` = NULL to turn off this warning.

With /tmp location

bfc <- BiocFileCache::BiocFileCache("/tmp/pmaj_index_cache")
using temporary cache /tmp/Rtmp2O0DyA/BiocFileCache
> loc <- BiocFileCache::bfcnew(bfc, rname="testing", ext=".sqlite")
> saveDb(txdb, file=loc)
TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: EnsemblDbv97
# Organism: Parus major
# Taxonomy ID: 9157
# miRBase build ID: NA
# Genome: NA
# Nb of transcripts: 33036
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2021-04-12 15:14:08 +0200 (Mon, 12 Apr 2021)
# GenomicFeatures version at creation time: 1.42.3
# RSQLite version at creation time: 2.2.6
# DBSCHEMAVERSION: 1.2

This is what the cache location looks like on the file system

total 51K
drwxr-xr-x 5 me domain users  39 Apr 12 19:26 ..
-rw-r--r-- 1 me domain users 347 Apr 12 19:26 a2b377a501945_a2b377a501945.rds
-rw-r--r-- 1 me domain users 20K Apr 12 19:26 BiocFileCache.sqlite
-rw-r--r-- 1 me domain users   0 Apr 12 19:26 a2b375abb18b7_a2b375abb18b7.sqlite
drwxr-xr-x 2 me domain users   5 Apr 12 19:26 .

On /tmp it looks like this

total 58M
-rw-r--r-- 1 me domain users  347 Apr 12 19:29 a2b3738290f91_a2b3738290f91.rds
drwx------ 3 me domain users 4.0K Apr 12 19:29 ..
-rw-r--r-- 1 me domain users  57M Apr 12 19:29 a2b3746738891_a2b3746738891.sqlite
-rw-r--r-- 1 me domain users  20K Apr 12 19:29 BiocFileCache.sqlite
drwxr-xr-x 2 me domain users 4.0K Apr 12 19:29 .
-rw-r--r-- 1 me domain users 669K Apr 12 19:29 a2b373c53219a_a2b373c53219a.rds

A simple test write using bfc to the same cache location gives no issue:

> bfc <- BiocFileCache::BiocFileCache("/scratch/pmaj_index_cache")
> loc <- BiocFileCache::bfcnew(bfc, rname="testing2")
> x <- 1:10
> save(x, file=loc)
> bfcinfo(bfc)
# A tibble: 1 x 10
  rid   rname  create_time access_time rpath  rtype fpath last_modified_t… etag
  <chr> <chr>  <chr>       <chr>       <chr>  <chr> <chr>            <dbl> <chr>
1 BFC1  testi… 2021-04-12… 2021-04-12… ~/.ca… rela… a2b3…               NA NA
# … with 1 more variable: expires <dbl>

There is one warning when using native tximeta with /tmp (not the step by step version) that might be relevant, but I'm not sure were it is generated:

> se_s1 <- tximeta(samples_s1, useHub=FALSE)
importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
found matching linked transcriptome:
[ Ensembl - Parus major - release 97 ]
building EnsDb with 'ensembldb' package
Importing GTF file ... OK
Processing metadata ... OK
Processing genes ...
.
.
.
Generating index ... OK
  -------------
Verifying validity of the information in the database:
Checking transcripts ... OK
Checking exons ... OK
generating transcript ranges
Warning messages:
1: closing unused connection 3 (ftp://ftp.ensembl.org/pub/release-97/mysql/)
**3: call dbDisconnect() when finished working with a connection**
> sessionInfo()                                                                                                                                                                                                                      [1/9991]
R version 4.0.1 (2020-06-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] tximeta_1.9.6

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.2.1          Biobase_2.50.0
 [3] httr_1.4.2                    jsonlite_1.7.2
 [5] bit64_4.0.5                   AnnotationHub_2.22.0
 [7] shiny_1.6.0                   assertthat_0.2.1
 [9] interactiveDisplayBase_1.28.0 askpass_1.1
[11] BiocManager_1.30.12           stats4_4.0.1
[13] BiocFileCache_1.14.0          blob_1.2.1
[15] GenomeInfoDbData_1.2.4        Rsamtools_2.6.0
[17] yaml_2.2.1                    progress_1.2.2
[19] BiocVersion_3.12.0            lattice_0.20-41
[21] pillar_1.5.1                  RSQLite_2.2.6
[23] glue_1.4.2                    digest_0.6.27
[25] GenomicRanges_1.42.0          promises_1.2.0.1
[27] XVector_0.30.0                htmltools_0.5.1.1
[29] httpuv_1.5.5                  Matrix_1.2-18
[31] XML_3.99-0.6                  pkgconfig_2.0.3
[33] biomaRt_2.46.3                zlibbioc_1.36.0
[35] purrr_0.3.4                   xtable_1.8-4
[37] later_1.1.0.1                 BiocParallel_1.24.1
[39] tibble_3.1.0                  openssl_1.4.3
[41] AnnotationFilter_1.14.0       generics_0.1.0
[43] IRanges_2.24.1                ellipsis_0.3.1
[45] cachem_1.0.4                  SummarizedExperiment_1.20.0
[47] GenomicFeatures_1.42.3        lazyeval_0.2.2
[49] BiocGenerics_0.36.0           magrittr_2.0.1
[51] crayon_1.4.1                  mime_0.10
[53] memoise_2.0.0                 fansi_0.4.2
[55] xml2_1.3.2                    tools_4.0.1
[57] prettyunits_1.1.1             hms_1.0.0
[59] lifecycle_1.0.0               matrixStats_0.58.0
[61] stringr_1.4.0                 S4Vectors_0.28.1
[63] DelayedArray_0.16.3           ensembldb_2.14.0
[65] AnnotationDbi_1.52.0          Biostrings_2.58.0
[67] compiler_4.0.1                GenomeInfoDb_1.26.7
[69] rlang_0.4.10                  grid_4.0.1
[71] RCurl_1.98-1.3                tximport_1.18.0
[73] rstudioapi_0.13               rappdirs_0.3.3
[75] bitops_1.0-6                  DBI_1.1.1
[77] curl_4.3                      R6_2.5.0
[79] GenomicAlignments_1.26.0      dplyr_1.0.5
[81] rtracklayer_1.50.0            fastmap_1.1.0
[83] bit_4.0.4                     utf8_1.2.1
[85] ProtGenerics_1.22.0           stringi_1.5.3
[87] parallel_4.0.1                Rcpp_1.0.6
[89] vctrs_0.3.7                   dbplyr_2.1.1
[91] tidyselect_1.1.0
tximeta BiocFileCache • 1.6k views
ADD COMMENT
0
Entering edit mode
@3f155d9f
Last seen 3.6 years ago
Netherlands

The issue has been resolved. The firewall to the storage server resulted in timeouts for certain write requests.

ADD COMMENT
0
Entering edit mode

Thanks Judith for the follow-up and for the bug reports. It's useful for others who may find themselves in this situation.

ADD REPLY

Login before adding your answer.

Traffic: 690 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6