difference in detection p-values from Minfi vs. GenomeStudio
1
0
Entering edit mode
Last seen 5.1 years ago

I am using Minfi to process data from Illumina EPIC methylation beadChips and I have found a discrepancy between the number of CpGs that GenomeStudio reports with a detection P value <0.01 and the number that I get reading in the .idat files using Minfi in R.

For example, in a recent batch I had one sample with only 801,850 CpGs with detection P values <0.01 (92.5%) according to GenomeStudio, but when I read in the data from the idat files and used the detectionP() function in Minfi, I got a count of 842,051 CpGs with detection P <0.01 (97.1%).

Is there an explanation for the discrepancy? This example has a pretty extreme difference, but the number of good CpGs I get when I read from the idat files directly is consistently higher than what comes out of GenomeStudio.

Example code for getting the count of good detection P values from Minfi and sessionInfo is below.

RGSet <- read.metharray.exp(targets = targets)
detP <- detectionP(RGSet)
dim(detP)
# [1] 866836    137
sum(detP[,"200861170017_R06C01"]<0.01)
# [1] 842051

​


> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 23 (Server Edition)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets
[9] methods   base

other attached packages:
[1] Gviz_1.17.4                minfi_1.20.2               bumphunter_1.14.0
[4] locfit_1.5-9.1             iterators_1.0.8            foreach_1.4.3
[7] Biostrings_2.42.1          XVector_0.13.7             SummarizedExperiment_1.4.0
[10] GenomicRanges_1.26.4       GenomeInfoDb_1.10.3        IRanges_2.8.2
[13] S4Vectors_0.12.2           Biobase_2.34.0             BiocGenerics_0.19.2

loaded via a namespace (and not attached):
[1] nlme_3.1-131                  bitops_1.0-6
[3] matrixStats_0.52.1            RColorBrewer_1.1-2
[5] httr_1.2.1                    tools_3.3.0
[7] backports_1.0.5               doRNG_1.6
[9] nor1mix_1.2-2                 R6_2.2.0
[11] rpart_4.1-10                  Hmisc_4.0-2
[13] DBI_0.6-1                     lazyeval_0.2.0
[15] colorspace_1.3-2              nnet_7.3-12
[17] gridExtra_2.2.1               base64_2.0
[19] preprocessCore_1.36.0         htmlTable_1.9
[21] pkgmaker_0.22                 rtracklayer_1.34.2
[23] scales_0.4.1                  checkmate_1.8.2
[27] stringr_1.2.0                 digest_0.6.12
[29] Rsamtools_1.26.1              foreign_0.8-67
[31] illuminaio_0.16.0             siggenes_1.48.0
[33] GEOquery_2.40.0               base64enc_0.1-3
[35] dichromat_2.0-0               htmltools_0.3.5
[37] BSgenome_1.41.2               ensembldb_1.5.9
[39] limma_3.30.13                 htmlwidgets_0.8
[41] RSQLite_1.1-2                 BiocInstaller_1.24.0
[43] shiny_1.0.1                   mclust_5.2.3
[45] BiocParallel_1.8.1            acepack_1.4.1
[47] VariantAnnotation_1.19.7      RCurl_1.95-4.8
[49] magrittr_1.5                  Formula_1.2-1
[51] Matrix_1.2-8                  Rcpp_0.12.10
[53] munsell_0.4.3                 stringi_1.1.3
[55] MASS_7.3-45                   zlibbioc_1.20.0
[57] plyr_1.8.4                    AnnotationHub_2.5.4
[59] lattice_0.20-35               splines_3.3.0
[61] multtest_2.30.0               GenomicFeatures_1.26.4
[63] annotate_1.52.1               knitr_1.15.1
[65] beanplot_1.2                  rngtools_1.2.4
[67] codetools_0.2-15              biomaRt_2.30.0
[69] XML_3.98-1.6                  biovizBase_1.22.0
[71] latticeExtra_0.6-28           data.table_1.10.4
[73] httpuv_1.3.3                  gtable_0.2.0
[75] openssl_0.9.6                 reshape_0.8.6
[77] assertthat_0.1                ggplot2_2.2.1
[79] mime_0.5                      xtable_1.8-2
[81] survival_2.41-3               tibble_1.2
[83] GenomicAlignments_1.10.1      AnnotationDbi_1.36.2
[85] registry_0.3                  memoise_1.0.0
[87] cluster_2.0.6                 interactiveDisplayBase_1.12.0
> packageVersion("minfi")
[1] '1.20.2'
minfi genomestudio methylation • 2.3k views
0
Entering edit mode
@james-w-macdonald-5106
Last seen 9 hours ago
United States

The obvious answer is that they use different algorithms to determine which probes are actually measuring something, and apparently BeadStudio is a bit more conservative. However it is probably not possible to say much more than that because BeadStudio's code isn't (AFAIK) available for you to peruse.

It's easy to know what minfi is doing, because it's Open Source, so you can just look at the code, or read the help page or the paper(s) describing how minfi works (or probably all of those things). So you can know exactly how minfi determines if a given probe is detecting anything above background.

It's a bit harder to know what BeadStudio is doing. Even if they have a whitepaper or a description somewhere, it is often impossible to know exactly what they are doing under the hood, because a general description (even if quite detailed) is not likely to be as descriptive as the underlying code would be.

0
Entering edit mode

Thank you for the fast and helpful response, James. I also contacted Illumina tech support and they basically said the same thing -- that they are using a different algorithm to generate detection p values.

0
Entering edit mode
The implementation in minfi is my interpretation of the only documentation I could find on what GenomeStudio actually does, which was a couple of sentences in the manual. If the manual has been expanded, or there is more information somewhere else, I am happy to take a look at it and consider changing minfi. Kasper On Tue, May 16, 2017 at 4:56 PM, brhead [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User brhead <https: support.bioconductor.org="" u="" 11927=""/> wrote Comment: > difference in detection p-values from Minfi vs. GenomeStudio > <https: support.bioconductor.org="" p="" 95971="" #96017="">: > > Thank you for the fast and helpful response, James. I also contacted > Illumina tech support and they basically said the same thing -- that they > are using a different algorithm to generate detection p values. > > ------------------------------ > > Post tags: minfi, genomestudio, methylation > > You may reply via email or visit https://support.bioconductor. > org/p/95971/#96017 >
0
Entering edit mode

Thanks, Kasper! Here is the documentation that Illumina tech support pointed me to:

"The information we have on the algorithms used in GenomeStudio Methylation are found in the user guide, linked here: https://support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/genomestudio/genomestudio-2011-1/genomestudio-methylation-v1-8-user-guide-11319130-b.pdf

The actual calculation for detection P value is likely inherited from the formula used for Gene Expression, which is described in the gene expression user guide on page 106: https://support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/genomestudio/genomestudio-2011-1/genomestudio-gx-module-v1-0-user-guide-11319121-a.pdf

Besides these, we do not have more detailed information about the algorithms used, but I hope this helps."

... It seems that not even tech support can say for sure what GenomeStudio does! I imagine this documentation is exactly what you looked at before, so there may not be anything to change.

0
Entering edit mode
Thats useful, esp. the expression link. Do you have a couple of IDAT files together with the output from GenomeStudio. I could implement their (weird) p-value calculation and we could check if we get the same as from GenomeStudio. You can share the stuff with me using for example Dropbox. Best, Kasper On Wed, May 17, 2017 at 10:01 PM, brhead [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User brhead <https: support.bioconductor.org="" u="" 11927=""/> wrote Comment: > difference in detection p-values from Minfi vs. GenomeStudio > <https: support.bioconductor.org="" p="" 95971="" #96074="">: > > Thanks, Kasper! Here is the documentation that Illumina tech support > pointed me to: > > "The information we have on the algorithms used in GenomeStudio > Methylation are found in the user guide, linked here: https://support. > illumina.com/content/dam/illumina-support/documents/ > documentation/software_documentation/genomestudio/genomestudio-2011-1/ > genomestudio-methylation-v1-8-user-guide-11319130-b.pdf > > The actual calculation for detection P value is likely inherited from the > formula used for Gene Expression, which is described in the gene expression > user guide on page 106: https://support.illumina.com/content/dam/illumina- > support/documents/documentation/software_documentation/genomestudio/ > genomestudio-2011-1/genomestudio-gx-module-v1-0-user-guide-11319121-a.pdf > > Besides these, we do not have more detailed information about the > algorithms used, but I hope this helps." > > ... It seems that not even tech support can say for sure what GenomeStudio > does! I imagine this documentation is exactly what you looked at before, so > there may not be anything to change. > > ------------------------------ > > Post tags: minfi, genomestudio, methylation > > You may reply via email or visit https://support.bioconductor. > org/p/95971/#96074 >
0
Entering edit mode

Great! I just sent you an email. Let me know if you need anything else.

-Brooke