Errors estimating cell types using DNA methylation cord blood reference data
Entering edit mode
metamaden ▴ 10
Last seen 7 months ago
United States


I'm sharing two errors I encountered while working with DNA methylation cord blood reference datasets in the function FlowSorted.Blood.EPIC::estimateCellCounts2().

I was only able to work out a fix for the first error, which I encountered when running with default settings and not manually specifying the cellTypes argument. I would greatly appreciate any help with the second error.

For my reproducible example, I loaded the dependencies and a test dataset from minfiData as follows.

# load dependencies
libv <- c("minfi","FlowSorted.Blood.EPIC","FlowSorted.CordBlood.450k","minfiData")
sapply(libv, library, character.only = T)
# load example data
rg <- get(data(RGsetEx))

First, I noticed the cellTypes argument needs to be specified explicitly, otherwise the following happens.

compct <- "CordBlood"
estimateCellCounts2(rg, compositeCellType = compct)

# returns:

[estimateCellCounts2] Consider including 'nRBC' in argument 'cellTypes' for cord blood estimation.
[estimateCellCounts2] Check whether 'Gran' or 'Neu' is present in your reference and adjust argument 'cellTypes' for your estimation.
[convertArray] Casting as IlluminaHumanMethylation450k
Error in estimateCellCounts2(rg.epic, compositeCellType = compct) :
  all elements of argument 'cellTypes' needs to be part of the reference phenoData columns 'CellType' (containg the following elements: '')

For anyone reading, or in case the package authors wish to address this, I was able to bypass the error by taking the cellType levels from the dataset directly. However, this results in the second error:

cb <- get(data(FlowSorted.CordBlood.450k))
ctv <- unique(cb$CellType)
estimateCellCounts2(rg, compositeCellType = compct, cellTypes = ctv)

# returns:

[convertArray] Casting as IlluminaHumanMethylation450k
[estimateCellCounts2] Combining user data with reference (flow sorted) data.
[estimateCellCounts2] Processing user and reference data together.
[estimateCellCounts2] Picking probes for composition estimation.
Error in p[trainingProbes, ] : subscript out of bounds

When I run traceback, the result is:


# returns:

2: pickCompProbes(referenceMset, cellTypes = cellTypes, compositeCellType = compositeCellType,
       probeSelect = probeSelect)
1: estimateCellCounts2(rg, compositeCellType = compct, cellTypes = ctv)

The error pertains to an out of bound index when subsetting some object p by something called trainingProbes. I also encountered the same error when trying this approach with the EPIC array example RGChannelSet from minfiDataEPIC (e.g. using library(minfiDataEPIC);rg.epic <- get(data(RGsetEPIC))).

When I used the above approach (e.g. specifying cellTypes manually) for the other cord blood datasets specified in the docstrings (i.e. "CordBloodNorway", "CordBloodCombined", and "CordTissueAndBlood"), I encountered the same error in all cases except for the "CordBloodNorway" dataset (provided from the FlowSorted.CordBloodNorway.450k package). I wasn't able to successfully download the dataset for "CordBloodCombined" (e.g. using BiocManager::install(FlowSorted.CordBloodCombined.450k), but that may be because I am running an older version of R (3.6.0) and Bioconductor (3.10). I wasn't able to install the FlowSorted.CordBlood.EPIC package on a machine running a newer R version (v.4.2.0) with Bioconductor 3.15, and so was unable to test my findings in a newer environment at time of writing.

Thanks in advance for any help!

best regards,


Here is my session info:

sessionInfo( )

# returns:

R version 3.6.0 (2019-04-26)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] minfiData_0.32.0                                    IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0 
 [3] IlluminaHumanMethylation450kmanifest_0.4.0          FlowSorted.CordBlood.450k_1.14.0                   
 [5] FlowSorted.Blood.EPIC_1.4.1                         ExperimentHub_1.12.0                               
 [7] AnnotationHub_2.18.0                                BiocFileCache_1.10.2                               
 [9] dbplyr_2.1.0                                        IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
[11] nlme_3.1-152                                        quadprog_1.5-8                                     
[13] genefilter_1.68.0                                   minfi_1.32.0                                       
[15] bumphunter_1.28.0                                   locfit_1.5-9.4                                     
[17] iterators_1.0.13                                    foreach_1.5.1                                      
[19] Biostrings_2.54.0                                   XVector_0.26.0                                     
[21] SummarizedExperiment_1.16.1                         DelayedArray_0.12.3                                
[23] BiocParallel_1.20.1                                 matrixStats_0.58.0                                 
[25] Biobase_2.46.0                                      GenomicRanges_1.38.0                               
[27] GenomeInfoDb_1.22.1                                 IRanges_2.20.2                                     
[29] S4Vectors_0.24.4                                    BiocGenerics_0.32.0                                

loaded via a namespace (and not attached):
 [1] ellipsis_0.3.1                siggenes_1.60.0               mclust_5.4.7                  base64_2.0                   
 [5] bit64_4.0.5                   interactiveDisplayBase_1.24.0 AnnotationDbi_1.48.0          fansi_0.4.2                  
 [9] xml2_1.3.2                    codetools_0.2-18              splines_3.6.0                 cachem_1.0.4                 
[13] scrime_1.3.5                  Rsamtools_2.2.3               annotate_1.64.0               shiny_1.6.0                  
[17] HDF5Array_1.14.4              BiocManager_1.30.10           readr_2.0.0                   compiler_3.6.0               
[21] httr_1.4.2                    assertthat_0.2.1              Matrix_1.3-2                  fastmap_1.1.0                
[25] limma_3.42.2                  later_1.1.0.1                 htmltools_0.5.1.1             prettyunits_1.1.1            
[29] tools_3.6.0                   glue_1.4.2                    GenomeInfoDbData_1.2.2        dplyr_1.0.5                  
[33] rappdirs_0.3.3                doRNG_1.8.2                   Rcpp_1.0.6                    vctrs_0.3.7                  
[37] multtest_2.42.0               preprocessCore_1.48.0         rtracklayer_1.46.0            DelayedMatrixStats_1.8.0     
[41] stringr_1.4.0                 mime_0.10                     lifecycle_1.0.0               rngtools_1.5                 
[45] XML_3.99-0.3                  beanplot_1.2                  zlibbioc_1.32.0               MASS_7.3-53.1                
[49] hms_1.0.0                     promises_1.2.0.1              rhdf5_2.30.1                  GEOquery_2.54.1              
[53] RColorBrewer_1.1-2            yaml_2.2.1                    curl_4.3                      memoise_2.0.0                
[57] biomaRt_2.42.1                reshape_0.8.8                 stringi_1.5.3                 RSQLite_2.2.3                
[61] BiocVersion_3.10.1            GenomicFeatures_1.38.2        rlang_0.4.10                  pkgconfig_2.0.3              
[65] bitops_1.0-6                  nor1mix_1.3-0                 lattice_0.20-41               purrr_0.3.4                  
[69] Rhdf5lib_1.8.0                GenomicAlignments_1.22.1      bit_4.0.4                     tidyselect_1.1.0             
[73] plyr_1.8.6                    magrittr_2.0.1                R6_2.5.0                      generics_0.1.0               
[77] DBI_1.1.1                     pillar_1.6.0                  survival_3.2-7                RCurl_1.98-1.2               
[81] tibble_3.1.1                  crayon_1.4.1                  utf8_1.2.1                    tzdb_0.1.2                   
[85] progress_1.2.2                grid_3.6.0                    data.table_1.14.0             blob_1.2.1                   
[89] digest_0.6.27                 xtable_1.8-4                  tidyr_1.1.3                   httpuv_1.5.5                 
[93] illuminaio_0.28.0             openssl_1.4.3                 askpass_1.1
FlowSorted.Blood.EPIC • 843 views
Entering edit mode

Just commenting to add: I was able to set up a session with a newer R version (v.4.1.3) and Bioc version (v.3.14), and I reproduced the same two errors shown above when calling estimateCellCounts2() for FlowSorted.CordBlood.450k with compositeCellType="CordBlood".

Entering edit mode
metamaden ▴ 10
Last seen 7 months ago
United States

I was testing the above example code with the function estimateCellCounts() from minfi, and encountered the same error. I was able to track the source to a call to the unexported function minfi:::pickCompProbes(), where it looks like this is a simple issue of NA values throwing an error. Simply adding an NA filter for trainingProbes should fix the issue for cord blood references and any future references for which NA values are encountered. I've opened a pull request to the minfi GitHub, This issue is resolved.


Login before adding your answer.

Traffic: 497 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6