I am using Minfi to process data from Illumina EPIC methylation beadChips and I have found a discrepancy between the number of CpGs that GenomeStudio reports with a detection P value <0.01 and the number that I get reading in the .idat files using Minfi in R.
For example, in a recent batch I had one sample with only 801,850 CpGs with detection P values <0.01 (92.5%) according to GenomeStudio, but when I read in the data from the idat files and used the detectionP() function in Minfi, I got a count of 842,051 CpGs with detection P <0.01 (97.1%).
Is there an explanation for the discrepancy? This example has a pretty extreme difference, but the number of good CpGs I get when I read from the idat files directly is consistently higher than what comes out of GenomeStudio.
Example code for getting the count of good detection P values from Minfi and sessionInfo is below.
RGSet <- read.metharray.exp(targets = targets) detP <- detectionP(RGSet) dim(detP) # [1] 866836 137 sum(detP[,"200861170017_R06C01"]<0.01) # [1] 842051
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 23 (Server Edition)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils datasets
[9] methods base
other attached packages:
[1] Gviz_1.17.4 minfi_1.20.2 bumphunter_1.14.0
[4] locfit_1.5-9.1 iterators_1.0.8 foreach_1.4.3
[7] Biostrings_2.42.1 XVector_0.13.7 SummarizedExperiment_1.4.0
[10] GenomicRanges_1.26.4 GenomeInfoDb_1.10.3 IRanges_2.8.2
[13] S4Vectors_0.12.2 Biobase_2.34.0 BiocGenerics_0.19.2
loaded via a namespace (and not attached):
[1] nlme_3.1-131 bitops_1.0-6
[3] matrixStats_0.52.1 RColorBrewer_1.1-2
[5] httr_1.2.1 tools_3.3.0
[7] backports_1.0.5 doRNG_1.6
[9] nor1mix_1.2-2 R6_2.2.0
[11] rpart_4.1-10 Hmisc_4.0-2
[13] DBI_0.6-1 lazyeval_0.2.0
[15] colorspace_1.3-2 nnet_7.3-12
[17] gridExtra_2.2.1 base64_2.0
[19] preprocessCore_1.36.0 htmlTable_1.9
[21] pkgmaker_0.22 rtracklayer_1.34.2
[23] scales_0.4.1 checkmate_1.8.2
[25] genefilter_1.56.0 quadprog_1.5-5
[27] stringr_1.2.0 digest_0.6.12
[29] Rsamtools_1.26.1 foreign_0.8-67
[31] illuminaio_0.16.0 siggenes_1.48.0
[33] GEOquery_2.40.0 base64enc_0.1-3
[35] dichromat_2.0-0 htmltools_0.3.5
[37] BSgenome_1.41.2 ensembldb_1.5.9
[39] limma_3.30.13 htmlwidgets_0.8
[41] RSQLite_1.1-2 BiocInstaller_1.24.0
[43] shiny_1.0.1 mclust_5.2.3
[45] BiocParallel_1.8.1 acepack_1.4.1
[47] VariantAnnotation_1.19.7 RCurl_1.95-4.8
[49] magrittr_1.5 Formula_1.2-1
[51] Matrix_1.2-8 Rcpp_0.12.10
[53] munsell_0.4.3 stringi_1.1.3
[55] MASS_7.3-45 zlibbioc_1.20.0
[57] plyr_1.8.4 AnnotationHub_2.5.4
[59] lattice_0.20-35 splines_3.3.0
[61] multtest_2.30.0 GenomicFeatures_1.26.4
[63] annotate_1.52.1 knitr_1.15.1
[65] beanplot_1.2 rngtools_1.2.4
[67] codetools_0.2-15 biomaRt_2.30.0
[69] XML_3.98-1.6 biovizBase_1.22.0
[71] latticeExtra_0.6-28 data.table_1.10.4
[73] httpuv_1.3.3 gtable_0.2.0
[75] openssl_0.9.6 reshape_0.8.6
[77] assertthat_0.1 ggplot2_2.2.1
[79] mime_0.5 xtable_1.8-2
[81] survival_2.41-3 tibble_1.2
[83] GenomicAlignments_1.10.1 AnnotationDbi_1.36.2
[85] registry_0.3 memoise_1.0.0
[87] cluster_2.0.6 interactiveDisplayBase_1.12.0
> packageVersion("minfi")
[1] '1.20.2'

Thank you for the fast and helpful response, James. I also contacted Illumina tech support and they basically said the same thing -- that they are using a different algorithm to generate detection p values.
Thanks, Kasper! Here is the documentation that Illumina tech support pointed me to:
"The information we have on the algorithms used in GenomeStudio Methylation are found in the user guide, linked here: https://support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/genomestudio/genomestudio-2011-1/genomestudio-methylation-v1-8-user-guide-11319130-b.pdf
The actual calculation for detection P value is likely inherited from the formula used for Gene Expression, which is described in the gene expression user guide on page 106: https://support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/genomestudio/genomestudio-2011-1/genomestudio-gx-module-v1-0-user-guide-11319121-a.pdf
Besides these, we do not have more detailed information about the algorithms used, but I hope this helps."
... It seems that not even tech support can say for sure what GenomeStudio does! I imagine this documentation is exactly what you looked at before, so there may not be anything to change.
Great! I just sent you an email. Let me know if you need anything else.
-Brooke