The short answer is that it's expected. The long answer is more complicated.
You should note that those aren't really NA
values, but instead are NaN
values. Which you should expect if you divide by zero. In addition, when you run getBeta
on an RGChannelSet
, the first thing that happens is preprocessRaw
is used to convert to a MethylSet
, because you can't get the betas without some sort of pre-processing. And preprocessRaw
doesn't do anything but compute the raw methylated and unmethylated values. As an example:
> z <- preprocessRaw(RGsetEx)
> rbind(getMeth(z)["cg00541303",], getUnmeth(z)["cg00541303",])
5723646052_R02C02 5723646052_R04C01 5723646052_R05C02 5723646053_R04C02
[1,] 4721 5427 1616 2535
[2,] 2827 2564 4094 3876
5723646053_R05C02 5723646053_R06C02
[1,] 0 878
[2,] 0 3411
## The fifth sample will be NaN because the beta will be 0/0
## Those same values after quantile normalization
> rbind(M["cg00541303",], U["cg00541303",])
5723646052_R02C02 5723646052_R04C01 5723646052_R05C02 5723646053_R04C02
[1,] 5004.833 4700.000 2208.167 2405.167
[2,] 3219.333 2671.667 3467.583 3463.167
5723646053_R05C02 5723646053_R06C02
[1,] 89.41791 1525.750
[2,] 39.19927 4008.167
## which will no longer generate an NaN beta value
Doing any sort of normalization will tend to remove any NaN
beta values, because you won't have any zero methylation or unmethylation values. But you can still get rid of CpGs that appear to be undetected in more than one sample:
> detectionP(RGsetEx)["cg00541303",]
5723646052_R02C02 5723646052_R04C01 5723646052_R05C02 5723646053_R04C02
0.000000e+00 0.000000e+00 4.387266e-305 1.330263e-288
5723646053_R05C02 5723646053_R06C02
1.000000e+00 5.385087e-67
I am going to add an example to reproduce what I am saying, with the minfiData package:
head(rawb[idx,]): 5723646052_R02C02 5723646052_R04C01 5723646052_R05C02 5723646053_R04C02 5723646053_R05C02 5723646053_R06C02 cg00541303 0.62546370 0.67913903 0.2830123 0.39541413 NaN 0.20470972 cg12211867 0.28244096 0.33997644 0.7489393 0.65331968 0.4373757 NaN cg21422909 0.06863271 NaN 0.1095389 0.08551037 0.1019352 0.08451947 cg01119172 0.82459109 0.72714932 0.7325581 0.80993602 NaN 0.72307692 cg24494008 0.10291859 0.08823529 0.1718062 NaN 0.1111831 0.23105263 cg01157718 NaN 0.38190955 0.8803734 0.72485992 0.8875035 0.67151163
head(normb[idx,]) 5723646052_R02C02 5723646052_R04C01 5723646052_R05C02 5723646053_R04C02 5723646053_R05C02 5723646053_R06C02 cg00541303 0.60855203 0.63757631 0.3890528 0.40985515 0.6952252 0.27570889 cg12211867 0.34563935 0.39431849 0.7164490 0.59228549 0.4622814 0.47585709 cg21422909 0.07522882 0.51918442 0.1397407 0.08466117 0.0844101 0.09255525 cg01119172 0.76768695 0.68974381 0.7780127 0.81666941 0.6952252 0.82167865 cg24494008 0.12813159 0.09321829 0.1666078 0.51918442 0.1081707 0.19789654 cg01157718 0.67244123 0.38078366 0.8881847 0.73250020 0.8763427 0.73704905
I do not understand why these methylation values seem to appear from nowhere. Is there an imputation process?
Session info:
R Under development (unstable) (2020-11-08 r79408) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 18.04.3 LTS
Matrix products: default BLAS: /home/omorante/R-devel/lib/libRblas.so LAPACK: /home/omorante/R-devel/lib/libRlapack.so
locale: [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=es_ES.UTF-8 LC_MONETARY=en_AU.UTF-8
[6] LC_MESSAGES=es_ES.UTF-8 LC_PAPER=en_AU.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages: [1] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0 IlluminaHumanMethylation450kmanifest_0.4.0
[3] minfi_1.37.0 bumphunter_1.33.0
[5] locfit_1.5-9.4 iterators_1.0.13
[7] foreach_1.5.1 Biostrings_2.59.0
[9] XVector_0.31.0 SummarizedExperiment_1.21.0
[11] Biobase_2.51.0 MatrixGenerics_1.3.0
[13] matrixStats_0.57.0 GenomicRanges_1.43.0
[15] GenomeInfoDb_1.27.0 IRanges_2.25.2
[17] S4Vectors_0.29.3 BiocGenerics_0.37.0
loaded via a namespace (and not attached): [1] nlme_3.1-150 bitops_1.0-6 bit64_4.0.5 RColorBrewer_1.1-2 progress_1.2.2
[6] httr_1.4.2 tools_4.1.0 doRNG_1.8.2 nor1mix_1.3-0 R6_2.5.0
[11] HDF5Array_1.19.0 DBI_1.1.0 rhdf5filters_1.3.0 tidyselect_1.1.0 prettyunits_1.1.1
[16] base64_2.0 minfiData_0.37.0 bit_4.0.4 curl_4.3 compiler_4.1.0
[21] preprocessCore_1.53.0 xml2_1.3.2 DelayedArray_0.17.0 rtracklayer_1.51.0 readr_1.4.0
[26] quadprog_1.5-8 genefilter_1.73.0 askpass_1.1 rappdirs_0.3.1 Rsamtools_2.7.0
[31] stringr_1.4.0 digest_0.6.27 illuminaio_0.33.0 siggenes_1.65.0 GEOquery_2.59.0
[36] pkgconfig_2.0.3 scrime_1.3.5 sparseMatrixStats_1.3.0 dbplyr_2.0.0 limma_3.47.0
[41] rlang_0.4.8 rstudioapi_0.11 RSQLite_2.2.1 DelayedMatrixStats_1.13.0 generics_0.1.0
[46] mclust_5.4.6 BiocParallel_1.25.1 dplyr_1.0.2 RCurl_1.98-1.2 magrittr_1.5
[51] GenomeInfoDbData_1.2.4 Matrix_1.2-18 Rcpp_1.0.5 Rhdf5lib_1.13.0 lifecycle_0.2.0
[56] stringi_1.5.3 MASS_7.3-53 zlibbioc_1.37.0 rhdf5_2.35.0 plyr_1.8.6
[61] BiocFileCache_1.15.0 grid_4.1.0 blob_1.2.1 crayon_1.3.4 lattice_0.20-41
[66] splines_4.1.0 multtest_2.47.0 GenomicFeatures_1.43.1 annotate_1.69.0 hms_0.5.3
[71] beanplot_1.2 pillar_1.4.6 rngtools_1.5 codetools_0.2-18 biomaRt_2.47.0
[76] XML_3.99-0.5 glue_1.4.2 data.table_1.13.2 vctrs_0.3.4 tidyr_1.1.2
[81] openssl_1.4.3 purrr_0.3.4 reshape_0.8.8 assertthat_0.2.1 xtable_1.8-4
[86] survival_3.2-7 tibble_3.0.4 GenomicAlignments_1.27.1 AnnotationDbi_1.53.0 memoise_1.1.0
[91] ellipsis_0.3.1