problem normalizeMedians (limma)

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 6 hours ago

WEHI, Melbourne, Australia

>>Date: Mon, 17 Oct 2005 18:24:47 -0400 >>From: Francois Pepin <fpepin at="" cs.mcgill.ca=""> >>Subject: [BioC] problem normalizeMedians (limma) >>To: bioconductor at stat.math.ethz.ch >> >>Hi everyone, >> >>I'm not sure what should be done about it, but I've been surprised with >>the following behavior with normalizeBetweenArrays: >> >>(dat is an MAList here) >> > dat$A[1,1] >>[1] 8.796361 >> > dat <- normalizeBetweenArrays(dat) >> > dat$A[1,1] >>[1] Inf >> >>This is due to the fact that I've got almost 400 arrays here. The >>normalizeMedians(x) (which is called by normalizeBetweenArrays) has the >>following behavior: >> >> a.med <- apply(x, 2, median, na.rm = TRUE) >> a.med <- a.med/(prod(a.med))^(1/narrays) >> >>multiplying 400 relatively small numbers can easily reach R's limit as >>far as doubles are concerned. >> >> > 6^400 >>[1] Inf >> >>Wouldn't it be better to be working in log space instead? Yes, you're right. In limma 2.3.2 I will change (prod(a.med))^(1/narrays) to exp(mean(log(a.med))) which should make floating underflow very unlikely. BTW, the "scale" method has been the default for normalizeBetweenArrays() for historic reasons, inherited from the older sma package. I'm going to change the default to "Aquantile". Gordon >>Francois >> >> > sessionInfo() >>R version 2.1.0, 2005-04-18, x86_64-unknown-linux-gnu >> >>attached base packages: >>[1] "methods" "stats" "graphics" "grDevices" "utils" >>"datasets" >>[7] "base" >> >>other attached packages: >> limma >>"1.9.6"

limma limma • 725 views

ADD COMMENT • link updated 18.6 years ago by Marcus Davy ▴ 680 • written 18.6 years ago by Gordon Smyth 50k

0

Entering edit mode

Marcus Davy ▴ 680

@marcus-davy-374

Last seen 9.7 years ago

Hi, Doe this include modifying normalizeMedianDeviations aswell for consistency. I suspect normalizeMedianDeviations doesn't reach R's limit as quickly. Marcus MA <- new("RGList") n <- 200 genes <- 1000 MA$M <- matrix(runif(n*genes, -8, 8), nr=genes, nc=n) MA$A <- matrix(runif(n*genes, 1,16), nr=genes, nc=n) summary(c(MA$M)) summary(c(MA$A)) any(is.infinite(normalizeMedians(MA$A))) normalizeMedians2 <- function(x) { narrays <- NCOL(x) if (narrays == 1) return(x) a.med <- log(apply(x, 2, median, na.rm = TRUE)) a.med <- exp(a.med - mean(a.med)) t(t(x)/a.med) } # Subset normalizeMedians(MA$A[1:5,1:5]) normalizeMedians2(MA$A[1:5,1:5]) # Entire dataset any(is.infinite(normalizeMedians(MA$A))) any(is.infinite(normalizeMedians2(MA$A))) normalizeMedianDeviations2 <- function(x) { narrays <- NCOL(x) if (narrays == 1) return(x) medabs <- function(x) median(abs(as.numeric(x[!is.na(x)]))) xmat.mav <- log(apply(x, 2, medabs)) denom <- mean(xmat.mav) si <- exp(xmat.mav - denom) t(t(x)/si) } normalizeMedianDeviations(MA$M[1:5, 1:5]) normalizeMedianDeviations2(MA$M[1:5, 1:5]) # Entire dataset any(is.infinite(normalizeMedianDeviations(MA$M))) any(is.infinite(normalizeMedianDeviations2(MA$M))) On 10/19/05 1:19 PM, "Gordon Smyth" <smyth at="" wehi.edu.au=""> wrote: > >>> Date: Mon, 17 Oct 2005 18:24:47 -0400 >>> From: Francois Pepin <fpepin at="" cs.mcgill.ca=""> >>> Subject: [BioC] problem normalizeMedians (limma) >>> To: bioconductor at stat.math.ethz.ch >>> >>> Hi everyone, >>> >>> I'm not sure what should be done about it, but I've been surprised with >>> the following behavior with normalizeBetweenArrays: >>> >>> (dat is an MAList here) >>>> dat$A[1,1] >>> [1] 8.796361 >>>> dat <- normalizeBetweenArrays(dat) >>>> dat$A[1,1] >>> [1] Inf >>> >>> This is due to the fact that I've got almost 400 arrays here. The >>> normalizeMedians(x) (which is called by normalizeBetweenArrays) has the >>> following behavior: >>> >>> a.med <- apply(x, 2, median, na.rm = TRUE) >>> a.med <- a.med/(prod(a.med))^(1/narrays) >>> >>> multiplying 400 relatively small numbers can easily reach R's limit as >>> far as doubles are concerned. >>> >>>> 6^400 >>> [1] Inf >>> >>> Wouldn't it be better to be working in log space instead? > > Yes, you're right. In limma 2.3.2 I will change > > (prod(a.med))^(1/narrays) > > to > > exp(mean(log(a.med))) > > which should make floating underflow very unlikely. > > BTW, the "scale" method has been the default for normalizeBetweenArrays() > for historic reasons, inherited from the older sma package. I'm going to > change the default to "Aquantile". > > Gordon > >>> Francois >>> >>>> sessionInfo() >>> R version 2.1.0, 2005-04-18, x86_64-unknown-linux-gnu >>> >>> attached base packages: >>> [1] "methods" "stats" "graphics" "grDevices" "utils" >>> "datasets" >>> [7] "base" >>> >>> other attached packages: >>> limma >>> "1.9.6" > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor ______________________________________________________ The contents of this e-mail are privileged and/or confidenti...{{dropped}}

ADD COMMENT • link 18.6 years ago Marcus Davy ▴ 680

0

Entering edit mode

Marcus, you're right that normalizeMedianDeviations() should also have the same modification. In fact, on looking at these functions again after a couple of years, I realize that normalizeMedians() can only be sensibly be applied to positive matrices, in which case it is the same as normalizeMedianDeviations(). Hence I have retired both functions and replaced both with the more numerically robust normalizeMedianAbsDeviations(). Change is commited to Bioconductor as limma 2.3.2. Gordon At 10:49 AM 19/10/2005, Marcus Davy wrote: >Hi, >Doe this include modifying normalizeMedianDeviations aswell for consistency. >I suspect normalizeMedianDeviations doesn't reach R's limit as quickly. > >Marcus [...] >On 10/19/05 1:19 PM, "Gordon Smyth" <smyth at="" wehi.edu.au=""> wrote: > >>> Date: Mon, 17 Oct 2005 18:24:47 -0400 > >>> From: Francois Pepin <fpepin at="" cs.mcgill.ca=""> > >>> Subject: [BioC] problem normalizeMedians (limma) > >>> To: bioconductor at stat.math.ethz.ch > >>> > >>> Hi everyone, > >>> > >>> I'm not sure what should be done about it, but I've been surprised with > >>> the following behavior with normalizeBetweenArrays: > >>> > >>> (dat is an MAList here) > >>>> dat$A[1,1] > >>> [1] 8.796361 > >>>> dat <- normalizeBetweenArrays(dat) > >>>> dat$A[1,1] > >>> [1] Inf > >>> > >>> This is due to the fact that I've got almost 400 arrays here. The > >>> normalizeMedians(x) (which is called by normalizeBetweenArrays) has the > >>> following behavior: > >>> > >>> a.med <- apply(x, 2, median, na.rm = TRUE) > >>> a.med <- a.med/(prod(a.med))^(1/narrays) > >>> > >>> multiplying 400 relatively small numbers can easily reach R's limit as > >>> far as doubles are concerned. > >>> > >>>> 6^400 > >>> [1] Inf > >>> > >>> Wouldn't it be better to be working in log space instead? > > > > Yes, you're right. In limma 2.3.2 I will change > > > > (prod(a.med))^(1/narrays) > > > > to > > > > exp(mean(log(a.med))) > > > > which should make floating underflow very unlikely. > > > > BTW, the "scale" method has been the default for normalizeBetweenArrays() > > for historic reasons, inherited from the older sma package. I'm going to > > change the default to "Aquantile". > > > > Gordon > > > >>> Francois [...]

ADD REPLY • link 18.6 years ago Gordon Smyth 50k

Login before adding your answer.