problem normalizeMedians (limma)
1
0
Entering edit mode
@gordon-smyth
Last seen 12 minutes ago
WEHI, Melbourne, Australia
>>Date: Mon, 17 Oct 2005 18:24:47 -0400 >>From: Francois Pepin <fpepin at="" cs.mcgill.ca=""> >>Subject: [BioC] problem normalizeMedians (limma) >>To: bioconductor at stat.math.ethz.ch >> >>Hi everyone, >> >>I'm not sure what should be done about it, but I've been surprised with >>the following behavior with normalizeBetweenArrays: >> >>(dat is an MAList here) >> > dat$A[1,1] >>[1] 8.796361 >> > dat <- normalizeBetweenArrays(dat) >> > dat$A[1,1] >>[1] Inf >> >>This is due to the fact that I've got almost 400 arrays here. The >>normalizeMedians(x) (which is called by normalizeBetweenArrays) has the >>following behavior: >> >> a.med <- apply(x, 2, median, na.rm = TRUE) >> a.med <- a.med/(prod(a.med))^(1/narrays) >> >>multiplying 400 relatively small numbers can easily reach R's limit as >>far as doubles are concerned. >> >> > 6^400 >>[1] Inf >> >>Wouldn't it be better to be working in log space instead? Yes, you're right. In limma 2.3.2 I will change (prod(a.med))^(1/narrays) to exp(mean(log(a.med))) which should make floating underflow very unlikely. BTW, the "scale" method has been the default for normalizeBetweenArrays() for historic reasons, inherited from the older sma package. I'm going to change the default to "Aquantile". Gordon >>Francois >> >> > sessionInfo() >>R version 2.1.0, 2005-04-18, x86_64-unknown-linux-gnu >> >>attached base packages: >>[1] "methods" "stats" "graphics" "grDevices" "utils" >>"datasets" >>[7] "base" >> >>other attached packages: >> limma >>"1.9.6"
limma limma • 842 views
ADD COMMENT
0
Entering edit mode
Marcus Davy ▴ 680
@marcus-davy-374
Last seen 10.3 years ago
Hi, Doe this include modifying normalizeMedianDeviations aswell for consistency. I suspect normalizeMedianDeviations doesn't reach R's limit as quickly. Marcus MA <- new("RGList") n <- 200 genes <- 1000 MA$M <- matrix(runif(n*genes, -8, 8), nr=genes, nc=n) MA$A <- matrix(runif(n*genes, 1,16), nr=genes, nc=n) summary(c(MA$M)) summary(c(MA$A)) any(is.infinite(normalizeMedians(MA$A))) normalizeMedians2 <- function(x) { narrays <- NCOL(x) if (narrays == 1) return(x) a.med <- log(apply(x, 2, median, na.rm = TRUE)) a.med <- exp(a.med - mean(a.med)) t(t(x)/a.med) } # Subset normalizeMedians(MA$A[1:5,1:5]) normalizeMedians2(MA$A[1:5,1:5]) # Entire dataset any(is.infinite(normalizeMedians(MA$A))) any(is.infinite(normalizeMedians2(MA$A))) normalizeMedianDeviations2 <- function(x) { narrays <- NCOL(x) if (narrays == 1) return(x) medabs <- function(x) median(abs(as.numeric(x[!is.na(x)]))) xmat.mav <- log(apply(x, 2, medabs)) denom <- mean(xmat.mav) si <- exp(xmat.mav - denom) t(t(x)/si) } normalizeMedianDeviations(MA$M[1:5, 1:5]) normalizeMedianDeviations2(MA$M[1:5, 1:5]) # Entire dataset any(is.infinite(normalizeMedianDeviations(MA$M))) any(is.infinite(normalizeMedianDeviations2(MA$M))) On 10/19/05 1:19 PM, "Gordon Smyth" <smyth at="" wehi.edu.au=""> wrote: > >>> Date: Mon, 17 Oct 2005 18:24:47 -0400 >>> From: Francois Pepin <fpepin at="" cs.mcgill.ca=""> >>> Subject: [BioC] problem normalizeMedians (limma) >>> To: bioconductor at stat.math.ethz.ch >>> >>> Hi everyone, >>> >>> I'm not sure what should be done about it, but I've been surprised with >>> the following behavior with normalizeBetweenArrays: >>> >>> (dat is an MAList here) >>>> dat$A[1,1] >>> [1] 8.796361 >>>> dat <- normalizeBetweenArrays(dat) >>>> dat$A[1,1] >>> [1] Inf >>> >>> This is due to the fact that I've got almost 400 arrays here. The >>> normalizeMedians(x) (which is called by normalizeBetweenArrays) has the >>> following behavior: >>> >>> a.med <- apply(x, 2, median, na.rm = TRUE) >>> a.med <- a.med/(prod(a.med))^(1/narrays) >>> >>> multiplying 400 relatively small numbers can easily reach R's limit as >>> far as doubles are concerned. >>> >>>> 6^400 >>> [1] Inf >>> >>> Wouldn't it be better to be working in log space instead? > > Yes, you're right. In limma 2.3.2 I will change > > (prod(a.med))^(1/narrays) > > to > > exp(mean(log(a.med))) > > which should make floating underflow very unlikely. > > BTW, the "scale" method has been the default for normalizeBetweenArrays() > for historic reasons, inherited from the older sma package. I'm going to > change the default to "Aquantile". > > Gordon > >>> Francois >>> >>>> sessionInfo() >>> R version 2.1.0, 2005-04-18, x86_64-unknown-linux-gnu >>> >>> attached base packages: >>> [1] "methods" "stats" "graphics" "grDevices" "utils" >>> "datasets" >>> [7] "base" >>> >>> other attached packages: >>> limma >>> "1.9.6" > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor ______________________________________________________ The contents of this e-mail are privileged and/or confidenti...{{dropped}}
ADD COMMENT
0
Entering edit mode
Marcus, you're right that normalizeMedianDeviations() should also have the same modification. In fact, on looking at these functions again after a couple of years, I realize that normalizeMedians() can only be sensibly be applied to positive matrices, in which case it is the same as normalizeMedianDeviations(). Hence I have retired both functions and replaced both with the more numerically robust normalizeMedianAbsDeviations(). Change is commited to Bioconductor as limma 2.3.2. Gordon At 10:49 AM 19/10/2005, Marcus Davy wrote: >Hi, >Doe this include modifying normalizeMedianDeviations aswell for consistency. >I suspect normalizeMedianDeviations doesn't reach R's limit as quickly. > >Marcus [...] >On 10/19/05 1:19 PM, "Gordon Smyth" <smyth at="" wehi.edu.au=""> wrote: > >>> Date: Mon, 17 Oct 2005 18:24:47 -0400 > >>> From: Francois Pepin <fpepin at="" cs.mcgill.ca=""> > >>> Subject: [BioC] problem normalizeMedians (limma) > >>> To: bioconductor at stat.math.ethz.ch > >>> > >>> Hi everyone, > >>> > >>> I'm not sure what should be done about it, but I've been surprised with > >>> the following behavior with normalizeBetweenArrays: > >>> > >>> (dat is an MAList here) > >>>> dat$A[1,1] > >>> [1] 8.796361 > >>>> dat <- normalizeBetweenArrays(dat) > >>>> dat$A[1,1] > >>> [1] Inf > >>> > >>> This is due to the fact that I've got almost 400 arrays here. The > >>> normalizeMedians(x) (which is called by normalizeBetweenArrays) has the > >>> following behavior: > >>> > >>> a.med <- apply(x, 2, median, na.rm = TRUE) > >>> a.med <- a.med/(prod(a.med))^(1/narrays) > >>> > >>> multiplying 400 relatively small numbers can easily reach R's limit as > >>> far as doubles are concerned. > >>> > >>>> 6^400 > >>> [1] Inf > >>> > >>> Wouldn't it be better to be working in log space instead? > > > > Yes, you're right. In limma 2.3.2 I will change > > > > (prod(a.med))^(1/narrays) > > > > to > > > > exp(mean(log(a.med))) > > > > which should make floating underflow very unlikely. > > > > BTW, the "scale" method has been the default for normalizeBetweenArrays() > > for historic reasons, inherited from the older sma package. I'm going to > > change the default to "Aquantile". > > > > Gordon > > > >>> Francois [...]
ADD REPLY

Login before adding your answer.

Traffic: 934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6