autonomics: example("read_somascan", package = "autonomics") fails, but not on Bioc check servers
1
0
Entering edit mode
@henrik-bengtsson-4333
Last seen 6 months ago
United States

Hello, I'm trying to understand why Bioc checks don't pick up what my reverse-dependency checks on different systems detect:

> example("read_somascan", package = "autonomics")
rd_sms> file <- download_data('atkin18.somascan.adat')
rd_sms> read_somascan(file, pca = TRUE, fit = 'limma', block = 'Subject_ID')
Error in (1 + f_col):n_col : NA/NaN argument

Details: Debugging this reveals that the download file at line endings with CRCRLF (sic!), which I don't think is expected by read_somascan(), which uses readLines() to read the lines. My guess is that the the source data file is corrupt;

$ wget https://bitbucket.org/graumannlabtools/autonomics/downloads/atkin18.somascan.adat
$ ls -l atkin18.somascan.adat 
-rw-rw-r-- 1 henrik henrik 674340 Jul 12 03:02 atkin18.somascan.adat
$ md5sum atkin18.somascan.adat 
6253e8fe04448fc1c43b73baf45ba62e  atkin18.somascan.adat
$ file atkin18.somascan.adat 
atkin18.somascan.adat: ASCII text, with very long lines (11067), with CRLF, CR line terminators
$ cat -A atkin18.somascan.adat
[ shows line endings ^M^M$ == CRCRLF ]

My questions:

  1. Is this just me? I can reproduce it on a CentOS 7, a Rocky 8, and an Ubuntu 22.04 system at different sites, all running R 4.3.2 and Bioc 3.18 (autonomics 1.10.2).

  2. Why isn't this error occurring on the Bioconductor check servers ( https://bioconductor.org/checkResults/release/bioc-LATEST/autonomics/ )?

> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Rocky Linux 8.8 (Green Obsidian)

Matrix products: default
BLAS:   /wynton/home/cbi/shared/software/CBI/_rocky8/R-4.3.2-gcc10/lib64/R/lib/libRblas.so
LAPACK: /wynton/home/cbi/shared/software/CBI/_rocky8/R-4.3.2-gcc10/lib64/R/lib/libRlapack.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8    LC_NUMERIC=C            LC_TIME=C
 [4] LC_COLLATE=en_US.UTF-8  LC_MONETARY=C           LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C              LC_NAME=C               LC_ADDRESS=C
[10] LC_TELEPHONE=C          LC_MEASUREMENT=C        LC_IDENTIFICATION=C

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] autonomics_1.10.2

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0            dplyr_1.1.4
 [3] blob_1.2.4                  filelock_1.0.2
 [5] R.utils_2.12.3              bitops_1.0-7
 [7] fastmap_1.1.1               RCurl_1.98-1.13
 [9] BiocFileCache_2.10.1        assertive.files_0.0-2
[11] digest_0.6.33               lifecycle_1.0.4
[13] statmod_1.5.0               RSQLite_2.3.3
[15] magrittr_2.0.3              compiler_4.3.2
[17] rlang_1.1.2                 tools_4.3.2
[19] utf8_1.2.4                  assertive.base_0.0-9
[21] data.table_1.14.8           assertive.sets_0.0-3
[23] S4Arrays_1.2.0              bit_4.0.5
[25] curl_5.1.0                  DelayedArray_0.28.0
[27] abind_1.4-5                 withr_2.5.2
[29] purrr_1.0.2                 BiocGenerics_0.48.1
[31] desc_1.4.2                  R.oo_1.25.0
[33] grid_4.3.2                  stats4_4.3.2
[35] fansi_1.0.5                 colorspace_2.1-0
[37] progressr_0.14.0            edgeR_4.0.2
[39] ggplot2_3.4.4               scales_1.3.0
[41] MultiAssayExperiment_1.28.0 SummarizedExperiment_1.32.0
[43] debugme_1.1.0               cli_3.6.1
[45] crayon_1.5.2                generics_0.1.3
[47] parsedate_1.3.1             cranlike_1.0.2
[49] httr_1.4.7                  readxl_1.4.3
[51] DBI_1.1.3                   cachem_1.0.8
[53] zlibbioc_1.48.0             parallel_4.3.2
[55] cellranger_1.1.0            XVector_0.42.0
[57] matrixStats_1.1.0           vctrs_0.6.4
[59] Matrix_1.6-3                IRanges_2.36.0
[61] S4Vectors_0.40.2            bit64_4.0.5
[63] ggrepel_0.9.4               locfit_1.5-9.8
[65] limma_3.58.1                tidyr_1.3.0
[67] glue_1.6.2                  crancache_0.0.0.9001
[69] rematch2_2.1.2              stringi_1.8.2
[71] gtable_0.3.4                GenomeInfoDb_1.38.1
[73] GenomicRanges_1.54.1        munsell_0.5.0
[75] tibble_3.2.1                pillar_1.9.0
[77] rappdirs_0.3.3              GenomeInfoDbData_1.2.11
[79] R6_2.5.1                    dbplyr_2.4.0
[81] rprojroot_2.0.4             lattice_0.22-5
[83] Biobase_2.62.0              R.methodsS3_1.8.2
[85] memoise_2.0.1               Rcpp_1.0.11
[87] gridExtra_2.3               SparseArray_1.2.2
[89] MatrixGenerics_1.14.0       assertive.numbers_0.0-2
[91] pkgconfig_2.0.3

EDIT: Replaced example("create_design", package = "autonomics") with example("read_somascan", package = "autonomics"), which has the same problem.

autonomics • 579 views
ADD COMMENT
1
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 11 hours ago
EMBL Heidelberg

It looks like autonomics uses BiocFileCache to create a local copy of the data and save repeat downloading. Given the (relatively) recent last update on that file, I'd guess that the build system is using a cached version from before it was changed on the remote server. As far as I can see autonomics:::download_data() has a commented out call to bfcneedsupdate(). My understanding is that should address this issue, but maybe it wasn't working as expected.

ADD COMMENT
0
Entering edit mode

Thanks Mike for diving deeper into this. Your findings make sense. It sounds like the Bioconductor checks need to be modified so that they do not have a memory from previous runs and rely on side effects such as file caches. It would probably better to test with a temporary file cache, so the cache is only active during a single package check. I'll try to find the proper issue tracker to post this proposal.

ADD REPLY

Login before adding your answer.

Traffic: 706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6