hoeffd function (CRAN HMisc package) use in arrayQualityMetrics
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.7 years ago
ArrayQualityMetrics uses hoeffd to assess any linear relationship between M and A in MA plots and flags arrays where D > 0.15. Some of the matrices I'm evaluating have NA's; those result from filtering, for example in applying Illumina beadarray detection p-value rules. As you can see from evaluating the script below, NA's affect hoeffd. In my situation, arrays with no strong linear relationship are flagged by this AQM measure. Is this a bug in hoeffd? Should AQM address this, perhaps by removing NA's in M and A before applying hoeffd? # Filename: hoeffd.R # # Test Hmisc hoeffd() response to NA's in a matrix. Used by # AQM to assess outliers in the MA plots. # # J Davison 5dec2013 # #--> source('hoeffd.R',echo=TRUE,max=Inf) # library(Hmisc) library(arrayQualityMetrics) set.seed(11) M = runif(10)*10 set.seed(7) A = runif(10)*10 df1 = data.frame(M, A) hoeffd(as.matrix(df1))$D # M A # M 1.0000 0.0437 # A 0.0437 1.0000 ### Add NA's df2 = rbind(df1, data.frame(M=rep(NA, 2), A=rep(NA, 2))) hoeffd(as.matrix(df2))$D # M A # M 1.0000 0.0959 # A 0.0959 1.0000 ### Add more NA's df3 = rbind(df1, data.frame(M=rep(NA, 4), A=rep(NA, 4))) hoeffd(as.matrix(df3))$D # M A # M 1.000 0.163 # A 0.163 1.000 -- output of sessionInfo(): > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] C attached base packages: [1] splines grid stats graphics [5] grDevices utils datasets methods [9] base other attached packages: [1] arrayQualityMetrics_3.18.0 [2] Hmisc_3.13-0 [3] Formula_1.1-1 [4] survival_2.37-4 [5] lattice_0.20-24 [6] cluster_1.14.4 [7] BiocInstaller_1.12.0 loaded via a namespace (and not attached): [1] AnnotationDbi_1.23.28 BeadDataPackR_1.14.0 [3] Biobase_2.21.7 BiocGenerics_0.7.8 [5] Biostrings_2.30.0 Cairo_1.5-2 [7] DBI_0.2-7 IRanges_1.20.5 [9] RColorBrewer_1.0-5 RSQLite_0.11.4 [11] SVGAnnotation_0.93-1 XML_3.98-1.1 [13] XVector_0.1.4 affy_1.39.6 [15] affyPLM_1.38.0 affyio_1.29.5 [17] annotate_1.39.0 beadarray_2.12.0 [19] colorspace_1.2-4 gcrma_2.34.0 [21] genefilter_1.43.0 hwriter_1.3 [23] latticeExtra_0.6-26 limma_3.18.2 [25] parallel_3.0.2 plyr_1.8 [27] preprocessCore_1.23.0 reshape2_1.2.2 [29] setRNG_2011.11-2 stats4_3.0.2 [31] stringr_0.6.2 vsn_3.29.1 [33] xtable_1.7-1 zlibbioc_1.7.0 -- Sent via the guest posting facility at bioconductor.org.
beadarray beadarray • 2.0k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 25 days ago
EMBL European Molecular Biology Laborat…
Dear Jerry Bottomline: I am not (at least not yet) sure the answer to your problem is a change to arrayQualityMetrics. Why don't you remove the apparently large fraction of 'bad' features from your data before calling arrayQualityMetrics? Long answer: 1. Another thing to do for you is to decide on your favourite alternative statistic to 'Hmisc::hoeffd', and patch the function "aqm.maplot" to use that one. The vignette "Advanced topics: Customizing arrayQualityMetrics reports and programmatic processing of the output" tells you how to do so. Any comments on that are welcome. (There was never an intention that default choices of arrayQualityMetrics would be always optimal for everybody.) 2. The example below is slightly construed, you simulate an "array" with 10 genes and then add 2 or 4 genes with NA, which amounts 17% or 29% of genes having NA. If the fraction of NA is smaller, then the effect on the result of 'hoeffd' is not so strong. The function (which resides in the Hmisc package) is designed to work with outliers. I am not familiar with your arrays and filtering procedure, but maybe an array with so many NA should be flagged as having "bad quality" anyway? Best wishes Wolfgang Il giorno Dec 9, 2013, alle ore 5:48 pm, jerry davison [guest] <guest at="" bioconductor.org=""> ha scritto: > > ArrayQualityMetrics uses hoeffd to assess any linear relationship between M and A in MA plots and flags arrays where D > 0.15. Some of the matrices I'm evaluating have NA's; those result from filtering, for example in applying Illumina beadarray detection p-value rules. > > As you can see from evaluating the script below, NA's affect hoeffd. In my situation, arrays with no strong linear relationship are flagged by this AQM measure. > > Is this a bug in hoeffd? Should AQM address this, perhaps by removing NA's in M and A before applying hoeffd? > > # Filename: hoeffd.R > # > # Test Hmisc hoeffd() response to NA's in a matrix. Used by # AQM to assess outliers in the MA plots. > # > # J Davison 5dec2013 > # > #--> source('hoeffd.R',echo=TRUE,max=Inf) > # > > library(Hmisc) > library(arrayQualityMetrics) > > set.seed(11) > M = runif(10)*10 > > set.seed(7) > A = runif(10)*10 > > df1 = data.frame(M, A) > hoeffd(as.matrix(df1))$D > # M A > # M 1.0000 0.0437 > # A 0.0437 1.0000 > > ### Add NA's > df2 = rbind(df1, data.frame(M=rep(NA, 2), A=rep(NA, 2))) > hoeffd(as.matrix(df2))$D > # M A > # M 1.0000 0.0959 > # A 0.0959 1.0000 > > ### Add more NA's > df3 = rbind(df1, data.frame(M=rep(NA, 4), A=rep(NA, 4))) > hoeffd(as.matrix(df3))$D > # M A > # M 1.000 0.163 > # A 0.163 1.000 > > > > > -- output of sessionInfo(): > >> sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] C > > attached base packages: > [1] splines grid stats graphics > [5] grDevices utils datasets methods > [9] base > > other attached packages: > [1] arrayQualityMetrics_3.18.0 > [2] Hmisc_3.13-0 > [3] Formula_1.1-1 > [4] survival_2.37-4 > [5] lattice_0.20-24 > [6] cluster_1.14.4 > [7] BiocInstaller_1.12.0 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.23.28 BeadDataPackR_1.14.0 > [3] Biobase_2.21.7 BiocGenerics_0.7.8 > [5] Biostrings_2.30.0 Cairo_1.5-2 > [7] DBI_0.2-7 IRanges_1.20.5 > [9] RColorBrewer_1.0-5 RSQLite_0.11.4 > [11] SVGAnnotation_0.93-1 XML_3.98-1.1 > [13] XVector_0.1.4 affy_1.39.6 > [15] affyPLM_1.38.0 affyio_1.29.5 > [17] annotate_1.39.0 beadarray_2.12.0 > [19] colorspace_1.2-4 gcrma_2.34.0 > [21] genefilter_1.43.0 hwriter_1.3 > [23] latticeExtra_0.6-26 limma_3.18.2 > [25] parallel_3.0.2 plyr_1.8 > [27] preprocessCore_1.23.0 reshape2_1.2.2 > [29] setRNG_2011.11-2 stats4_3.0.2 > [31] stringr_0.6.2 vsn_3.29.1 > [33] xtable_1.7-1 zlibbioc_1.7.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6