Hello, I'm trying to understand why Bioc checks don't pick up what my reverse-dependency checks on different systems detect:
> example("read_somascan", package = "autonomics")
rd_sms> file <- download_data('atkin18.somascan.adat')
rd_sms> read_somascan(file, pca = TRUE, fit = 'limma', block = 'Subject_ID')
Error in (1 + f_col):n_col : NA/NaN argument
Details: Debugging this reveals that the download file at line endings with CRCRLF (sic!), which I don't think is expected by read_somascan()
, which uses readLines()
to read the lines. My guess is that the the source data file is corrupt;
$ wget https://bitbucket.org/graumannlabtools/autonomics/downloads/atkin18.somascan.adat
$ ls -l atkin18.somascan.adat
-rw-rw-r-- 1 henrik henrik 674340 Jul 12 03:02 atkin18.somascan.adat
$ md5sum atkin18.somascan.adat
6253e8fe04448fc1c43b73baf45ba62e atkin18.somascan.adat
$ file atkin18.somascan.adat
atkin18.somascan.adat: ASCII text, with very long lines (11067), with CRLF, CR line terminators
$ cat -A atkin18.somascan.adat
[ shows line endings ^M^M$ == CRCRLF ]
My questions:
Is this just me? I can reproduce it on a CentOS 7, a Rocky 8, and an Ubuntu 22.04 system at different sites, all running R 4.3.2 and Bioc 3.18 (autonomics 1.10.2).
Why isn't this error occurring on the Bioconductor check servers ( https://bioconductor.org/checkResults/release/bioc-LATEST/autonomics/ )?
> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Rocky Linux 8.8 (Green Obsidian)
Matrix products: default
BLAS: /wynton/home/cbi/shared/software/CBI/_rocky8/R-4.3.2-gcc10/lib64/R/lib/libRblas.so
LAPACK: /wynton/home/cbi/shared/software/CBI/_rocky8/R-4.3.2-gcc10/lib64/R/lib/libRlapack.so; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C LC_IDENTIFICATION=C
time zone: America/Los_Angeles
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] autonomics_1.10.2
loaded via a namespace (and not attached):
[1] tidyselect_1.2.0 dplyr_1.1.4
[3] blob_1.2.4 filelock_1.0.2
[5] R.utils_2.12.3 bitops_1.0-7
[7] fastmap_1.1.1 RCurl_1.98-1.13
[9] BiocFileCache_2.10.1 assertive.files_0.0-2
[11] digest_0.6.33 lifecycle_1.0.4
[13] statmod_1.5.0 RSQLite_2.3.3
[15] magrittr_2.0.3 compiler_4.3.2
[17] rlang_1.1.2 tools_4.3.2
[19] utf8_1.2.4 assertive.base_0.0-9
[21] data.table_1.14.8 assertive.sets_0.0-3
[23] S4Arrays_1.2.0 bit_4.0.5
[25] curl_5.1.0 DelayedArray_0.28.0
[27] abind_1.4-5 withr_2.5.2
[29] purrr_1.0.2 BiocGenerics_0.48.1
[31] desc_1.4.2 R.oo_1.25.0
[33] grid_4.3.2 stats4_4.3.2
[35] fansi_1.0.5 colorspace_2.1-0
[37] progressr_0.14.0 edgeR_4.0.2
[39] ggplot2_3.4.4 scales_1.3.0
[41] MultiAssayExperiment_1.28.0 SummarizedExperiment_1.32.0
[43] debugme_1.1.0 cli_3.6.1
[45] crayon_1.5.2 generics_0.1.3
[47] parsedate_1.3.1 cranlike_1.0.2
[49] httr_1.4.7 readxl_1.4.3
[51] DBI_1.1.3 cachem_1.0.8
[53] zlibbioc_1.48.0 parallel_4.3.2
[55] cellranger_1.1.0 XVector_0.42.0
[57] matrixStats_1.1.0 vctrs_0.6.4
[59] Matrix_1.6-3 IRanges_2.36.0
[61] S4Vectors_0.40.2 bit64_4.0.5
[63] ggrepel_0.9.4 locfit_1.5-9.8
[65] limma_3.58.1 tidyr_1.3.0
[67] glue_1.6.2 crancache_0.0.0.9001
[69] rematch2_2.1.2 stringi_1.8.2
[71] gtable_0.3.4 GenomeInfoDb_1.38.1
[73] GenomicRanges_1.54.1 munsell_0.5.0
[75] tibble_3.2.1 pillar_1.9.0
[77] rappdirs_0.3.3 GenomeInfoDbData_1.2.11
[79] R6_2.5.1 dbplyr_2.4.0
[81] rprojroot_2.0.4 lattice_0.22-5
[83] Biobase_2.62.0 R.methodsS3_1.8.2
[85] memoise_2.0.1 Rcpp_1.0.11
[87] gridExtra_2.3 SparseArray_1.2.2
[89] MatrixGenerics_1.14.0 assertive.numbers_0.0-2
[91] pkgconfig_2.0.3
EDIT: Replaced example("create_design", package = "autonomics")
with example("read_somascan", package = "autonomics")
, which has the same problem.
Thanks Mike for diving deeper into this. Your findings make sense. It sounds like the Bioconductor checks need to be modified so that they do not have a memory from previous runs and rely on side effects such as file caches. It would probably better to test with a temporary file cache, so the cache is only active during a single package check. I'll try to find the proper issue tracker to post this proposal.