Greetings,
I'm sharing two errors I encountered while working with DNA methylation cord blood reference datasets in the function FlowSorted.Blood.EPIC::estimateCellCounts2()
.
I was only able to work out a fix for the first error, which I encountered when running with default settings and not manually specifying the cellTypes
argument. I would greatly appreciate any help with the second error.
For my reproducible example, I loaded the dependencies and a test dataset from minfiData
as follows.
# load dependencies
libv <- c("minfi","FlowSorted.Blood.EPIC","FlowSorted.CordBlood.450k","minfiData")
sapply(libv, library, character.only = T)
# load example data
rg <- get(data(RGsetEx))
First, I noticed the cellTypes
argument needs to be specified explicitly, otherwise the following happens.
compct <- "CordBlood"
estimateCellCounts2(rg, compositeCellType = compct)
# returns:
[estimateCellCounts2] Consider including 'nRBC' in argument 'cellTypes' for cord blood estimation.
[estimateCellCounts2] Check whether 'Gran' or 'Neu' is present in your reference and adjust argument 'cellTypes' for your estimation.
[convertArray] Casting as IlluminaHumanMethylation450k
Error in estimateCellCounts2(rg.epic, compositeCellType = compct) :
all elements of argument 'cellTypes' needs to be part of the reference phenoData columns 'CellType' (containg the following elements: '')
For anyone reading, or in case the package authors wish to address this, I was able to bypass the error by taking the cellType
levels from the dataset directly. However, this results in the second error:
cb <- get(data(FlowSorted.CordBlood.450k))
ctv <- unique(cb$CellType)
estimateCellCounts2(rg, compositeCellType = compct, cellTypes = ctv)
# returns:
[convertArray] Casting as IlluminaHumanMethylation450k
[estimateCellCounts2] Combining user data with reference (flow sorted) data.
[estimateCellCounts2] Processing user and reference data together.
[estimateCellCounts2] Picking probes for composition estimation.
Error in p[trainingProbes, ] : subscript out of bounds
When I run traceback, the result is:
traceback()
# returns:
2: pickCompProbes(referenceMset, cellTypes = cellTypes, compositeCellType = compositeCellType,
probeSelect = probeSelect)
1: estimateCellCounts2(rg, compositeCellType = compct, cellTypes = ctv)
The error pertains to an out of bound index when subsetting some object p
by something called trainingProbes
. I also encountered the same error when trying this approach with the EPIC array example RGChannelSet
from minfiDataEPIC
(e.g. using library(minfiDataEPIC);rg.epic <- get(data(RGsetEPIC))
).
When I used the above approach (e.g. specifying cellTypes
manually) for the other cord blood datasets specified in the docstrings (i.e. "CordBloodNorway", "CordBloodCombined", and "CordTissueAndBlood"), I encountered the same error in all cases except for the "CordBloodNorway" dataset (provided from the FlowSorted.CordBloodNorway.450k
package). I wasn't able to successfully download the dataset for "CordBloodCombined" (e.g. using BiocManager::install(FlowSorted.CordBloodCombined.450k)
, but that may be because I am running an older version of R (3.6.0) and Bioconductor (3.10). I wasn't able to install the FlowSorted.CordBlood.EPIC
package on a machine running a newer R version (v.4.2.0) with Bioconductor 3.15, and so was unable to test my findings in a newer environment at time of writing.
Thanks in advance for any help!
best regards,
Sean
Here is my session info:
sessionInfo( )
# returns:
R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] minfiData_0.32.0 IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
[3] IlluminaHumanMethylation450kmanifest_0.4.0 FlowSorted.CordBlood.450k_1.14.0
[5] FlowSorted.Blood.EPIC_1.4.1 ExperimentHub_1.12.0
[7] AnnotationHub_2.18.0 BiocFileCache_1.10.2
[9] dbplyr_2.1.0 IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
[11] nlme_3.1-152 quadprog_1.5-8
[13] genefilter_1.68.0 minfi_1.32.0
[15] bumphunter_1.28.0 locfit_1.5-9.4
[17] iterators_1.0.13 foreach_1.5.1
[19] Biostrings_2.54.0 XVector_0.26.0
[21] SummarizedExperiment_1.16.1 DelayedArray_0.12.3
[23] BiocParallel_1.20.1 matrixStats_0.58.0
[25] Biobase_2.46.0 GenomicRanges_1.38.0
[27] GenomeInfoDb_1.22.1 IRanges_2.20.2
[29] S4Vectors_0.24.4 BiocGenerics_0.32.0
loaded via a namespace (and not attached):
[1] ellipsis_0.3.1 siggenes_1.60.0 mclust_5.4.7 base64_2.0
[5] bit64_4.0.5 interactiveDisplayBase_1.24.0 AnnotationDbi_1.48.0 fansi_0.4.2
[9] xml2_1.3.2 codetools_0.2-18 splines_3.6.0 cachem_1.0.4
[13] scrime_1.3.5 Rsamtools_2.2.3 annotate_1.64.0 shiny_1.6.0
[17] HDF5Array_1.14.4 BiocManager_1.30.10 readr_2.0.0 compiler_3.6.0
[21] httr_1.4.2 assertthat_0.2.1 Matrix_1.3-2 fastmap_1.1.0
[25] limma_3.42.2 later_1.1.0.1 htmltools_0.5.1.1 prettyunits_1.1.1
[29] tools_3.6.0 glue_1.4.2 GenomeInfoDbData_1.2.2 dplyr_1.0.5
[33] rappdirs_0.3.3 doRNG_1.8.2 Rcpp_1.0.6 vctrs_0.3.7
[37] multtest_2.42.0 preprocessCore_1.48.0 rtracklayer_1.46.0 DelayedMatrixStats_1.8.0
[41] stringr_1.4.0 mime_0.10 lifecycle_1.0.0 rngtools_1.5
[45] XML_3.99-0.3 beanplot_1.2 zlibbioc_1.32.0 MASS_7.3-53.1
[49] hms_1.0.0 promises_1.2.0.1 rhdf5_2.30.1 GEOquery_2.54.1
[53] RColorBrewer_1.1-2 yaml_2.2.1 curl_4.3 memoise_2.0.0
[57] biomaRt_2.42.1 reshape_0.8.8 stringi_1.5.3 RSQLite_2.2.3
[61] BiocVersion_3.10.1 GenomicFeatures_1.38.2 rlang_0.4.10 pkgconfig_2.0.3
[65] bitops_1.0-6 nor1mix_1.3-0 lattice_0.20-41 purrr_0.3.4
[69] Rhdf5lib_1.8.0 GenomicAlignments_1.22.1 bit_4.0.4 tidyselect_1.1.0
[73] plyr_1.8.6 magrittr_2.0.1 R6_2.5.0 generics_0.1.0
[77] DBI_1.1.1 pillar_1.6.0 survival_3.2-7 RCurl_1.98-1.2
[81] tibble_3.1.1 crayon_1.4.1 utf8_1.2.1 tzdb_0.1.2
[85] progress_1.2.2 grid_3.6.0 data.table_1.14.0 blob_1.2.1
[89] digest_0.6.27 xtable_1.8-4 tidyr_1.1.3 httpuv_1.5.5
[93] illuminaio_0.28.0 openssl_1.4.3 askpass_1.1
Just commenting to add: I was able to set up a session with a newer R version (v.4.1.3) and Bioc version (v.3.14), and I reproduced the same two errors shown above when calling
estimateCellCounts2()
forFlowSorted.CordBlood.450k
withcompositeCellType="CordBlood"
.