I tried to use pdInfoBuilder to make a package with the platform information for a budding yeast tiling array from Affymetrix. I wanted to use this package together with Bioconductor's "oligo" package.
I downloaded these files from Affymetrix:
I made a package with the following code:
library(pdInfoBuilder) baseDir <- '/Users/EricF/projects/ISW2_FUN30/Yoshida_et_al_Pasero_BrdU_Mol_Cell_2014/s_cerevisiae_libraryfile/BPMAP' bpmap <- list.files(path = baseDir, pattern = '^\\S+_MR_\\S+bpmap$', full.names = TRUE) stopifnot(length(bpmap) == 1) cel <- list.files(path = baseDir, pattern = 'CEL$', full.names = TRUE) stopifnot(length(cel) == 1) seed <- new("AffyTilingPDInfoPkgSeed", bpmapFile = bpmap, celFile = cel, author = "Eric Foss", email = "firstname.lastname@example.org", biocViews = "AnnotationData", genomebuild = "sacCer1", organism = "Yeast", species = "Saccharomyces cerevisiae") makePdInfoPackage(seed, destDir = ".") install.packages('pd.sc03b.mr.v03', repos = NULL, type = 'source')
(I have done this both with the "_MF_" and with the "_MR_" .bpmap files, and I had identical problems with each.) Everything seemed to go well, although I did get 11 warnings, each of which was the following:
1: In result_fetch(res@ptr, n = n) : SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery().
When I loaded the package with "library(pd.sc03b.mr.v03)", everything went smoothly. However, when I look at the lengths of the chromosomes, something is wrong: 5 out of the 16 chromosomes had maximum coordinates that were far greater than the length of the chromosome (e.g. almost twice the length of the entire genome for a single chromosome; the other 11 chromosomes had maximum coordinates that were reasonable). The code below shows an example of how I determined this.
fls <- list.files(pattern = 'CEL$') # 15 .CEL files from PMID 24856221 affyRaw <- read.celfiles(filenames = fls, pkgname = 'pd.sc03b.mr.v03') max(pmPosition(affyRaw)) # 23,944,040 -almost twice the length of the cerevisiae genome
My session info is pasted below:
> sessionInfo() R version 4.1.0 (2021-05-18) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 11.4 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib locale:  en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages:  stats4 parallel stats graphics grDevices utils datasets methods base other attached packages:  pd.sc03b.mr.v03_0.0.1 DBI_1.1.1 oligo_1.56.0 Biobase_2.52.0 oligoClasses_1.54.0 RSQLite_2.2.7  Biostrings_2.60.1 GenomeInfoDb_1.28.1 XVector_0.32.0 IRanges_2.26.0 S4Vectors_0.30.0 BiocGenerics_0.38.0 loaded via a namespace (and not attached):  Rcpp_1.0.6 BiocManager_1.30.16 compiler_4.1.0 MatrixGenerics_1.4.0 bitops_1.0-7  iterators_1.0.13 tools_4.1.0 zlibbioc_1.38.0 bit_4.0.4 preprocessCore_1.54.0  memoise_2.0.0 lattice_0.20-44 ff_4.0.4 pkgconfig_2.0.3 rlang_0.4.11  Matrix_1.3-4 foreach_1.5.1 DelayedArray_0.18.0 rstudioapi_0.13 fastmap_1.1.0  GenomeInfoDbData_1.2.6 affxparser_1.64.0 vctrs_0.3.8 bit64_4.0.5 grid_4.1.0  blob_1.2.1 splines_4.1.0 codetools_0.2-18 matrixStats_0.59.0 GenomicRanges_1.44.0  SummarizedExperiment_1.22.0 RCurl_1.98-1.3 cachem_1.0.5 crayon_1.4.1 affyio_1.62.0 >
I would very much appreciate any suggestions for fixing this problem.