Question: Question about quantile normalization and NA value
0
5.2 years ago by
H@mamba.fhcrc.org10 wrote:
Dear all, I have a quation about quantile normalization and NA value. I'm going to normalize the microarray data by "normalizeBetweenArrays" which is the quantile normalization function in "limma" package. I normalized a data with NA as follows: > x <- matrix(c(100,15,200,250,110,16.5,220,275,120,18,240,300),4,3) > colnames(x) <- paste("Chip",1:3, sep="") > rownames(x) <- c("RNA-A","RNA-B","RNA-C","RNA-D") > > x Chip1 Chip2 Chip3 RNA-A 100 110.0 120 RNA-B 15 16.5 18 RNA-C 200 220.0 240 RNA-D 250 275.0 300 > > normalizeBetweenArrays(x) Chip1 Chip2 Chip3 RNA-A 110.0 110.0 110.0 RNA-B 16.5 16.5 16.5 RNA-C 220.0 220.0 220.0 RNA-D 275.0 275.0 275.0 > > y <- x > y[2,2] <- NA > > normalizeBetweenArrays(y) Chip1 Chip2 Chip3 RNA-A 134.44444 47.66667 134.44444 RNA-B 47.66667 NA 47.66667 RNA-C 226.11111 180.27778 226.11111 RNA-D 275.00000 275.00000 275.00000 I asuume the normalized y is a bit far away from normalized y. Does only one NA induce this large effect ? Should I normalize after replacing NA with some value, such as median(x[2,],na.rm=T) ? My environment is limma Version 3.16.6, R version 3.0.1. Thanks -- output of sessionInfo(): Dear all, I have a quation about quantile normalization and NA value. I'm going to normalize the microarray data by "normalizeBetweenArrays" which is the quantile normalization function in "limma" package. I normalized a data with NA as follows: > x <- matrix(c(100,15,200,250,110,16.5,220,275,120,18,240,300),4,3) > colnames(x) <- paste("Chip",1:3, sep="") > rownames(x) <- c("RNA-A","RNA-B","RNA-C","RNA-D") > > x Chip1 Chip2 Chip3 RNA-A 100 110.0 120 RNA-B 15 16.5 18 RNA-C 200 220.0 240 RNA-D 250 275.0 300 > > normalizeBetweenArrays(x) Chip1 Chip2 Chip3 RNA-A 110.0 110.0 110.0 RNA-B 16.5 16.5 16.5 RNA-C 220.0 220.0 220.0 RNA-D 275.0 275.0 275.0 > > y <- x > y[2,2] <- NA > > normalizeBetweenArrays(y) Chip1 Chip2 Chip3 RNA-A 134.44444 47.66667 134.44444 RNA-B 47.66667 NA 47.66667 RNA-C 226.11111 180.27778 226.11111 RNA-D 275.00000 275.00000 275.00000 I asuume the normalized y is a bit far away from normalized y. Does only one NA induce this large effect ? Should I normalize after replacing NA with some value, such as median(x[2,],na.rm=T) ? My environment is limma Version 3.16.6, R version 3.0.1. Thanks -- Sent via the guest posting facility at bioconductor.org.
microarray normalization limma • 2.3k views
modified 5.2 years ago by godahajime20 • written 5.2 years ago by H@mamba.fhcrc.org10
0
5.2 years ago by
Denali
Steve Lianoglou12k wrote:
Hi, On Tue, Jan 21, 2014 at 5:03 AM, <h at="" mamba.fhcrc.org=""> wrote: > > Dear all, > > I have a quation about quantile normalization and NA value. > > I'm going to normalize the microarray data by "normalizeBetweenArrays" which is the quantile normalization function in "limma" package. > I normalized a data with NA as follows: > >> x <- matrix(c(100,15,200,250,110,16.5,220,275,120,18,240,300),4,3) >> colnames(x) <- paste("Chip",1:3, sep="") >> rownames(x) <- c("RNA-A","RNA-B","RNA-C","RNA-D") >> >> x > Chip1 Chip2 Chip3 > RNA-A 100 110.0 120 > RNA-B 15 16.5 18 > RNA-C 200 220.0 240 > RNA-D 250 275.0 300 >> >> normalizeBetweenArrays(x) > Chip1 Chip2 Chip3 > RNA-A 110.0 110.0 110.0 > RNA-B 16.5 16.5 16.5 > RNA-C 220.0 220.0 220.0 > RNA-D 275.0 275.0 275.0 >> >> y <- x >> y[2,2] <- NA >> >> normalizeBetweenArrays(y) > Chip1 Chip2 Chip3 > RNA-A 134.44444 47.66667 134.44444 > RNA-B 47.66667 NA 47.66667 > RNA-C 226.11111 180.27778 226.11111 > RNA-D 275.00000 275.00000 275.00000 > > > I asuume the normalized y is a bit far away from normalized y. Does only one NA induce this large effect ? I suspect that this is only because you are doing the normalization over a very small dataset. With four observations per "array", 25% of your data on chip2 is missing ... so a change in a single datapoint has a larger affect than it would on your real array (which would have thousands of observations per array). Of course, if 25% of your real arrays have NA values, you might consider failing that array anyway ;-) > Should I normalize after replacing NA with some value, such as median(x[2,],na.rm=T) ? I'd think not. If you are analyzing commercial array, just stick with the prescribed steps you find in some of the many tutorials available (in limma or other bioc tutorials). If you have a custom array, more care will be needed. -steve -- Steve Lianoglou Computational Biologist Genentech
0
5.2 years ago by
godahajime20
godahajime20 wrote:
Dr Steve Lianoglou, Thanks for your reply. The sample size is too small as you mentioned. That matter may be left out of consideration because the actuall sample size is over 2000x300. I read the tutorial of limma and the source code of "normalizeBetweenArrays", however, I couldn't understand how NA values were processed. Could you show me the prodess? Thanks, [[alternative HTML version deleted]]
Hi, On Wed, Jan 22, 2014 at 3:43 AM, godahajime <godahajime at="" zoho.com=""> wrote: > Dr Steve Lianoglou, > > Thanks for your reply. > > The sample size is too small as you mentioned. > That matter may be left out of consideration because the actuall sample size is over 2000x300. > > I read the tutorial of limma and the source code of "normalizeBetweenArrays", however, I couldn't understand how NA values were processed. > Could you show me the prodess? They are handled "very carefully" ;-) The function that actually does the quantile normalization is limma::normalizeQuantiles. If you *really* want to understand what is happening there, I suggest you: (1) download the source code for limma (2) open the limma/R/norm.R file and jump to the normalizeQuantiles function. (3) reconstruct the parameters required to run the function, ie: (a) Create a test matrix with some (5) data points missing: R> A <- matrix(rnorm(50), nrow=10) R> A[sample(50, 5)] <- NA (b) Create a ties variable: R> ties <- TRUE (4) Now step through the code As you step through the code, take a careful look at what each line produces -- you will likely get tripped up by some of the code there, but read the documentation (I'm sure you will have to read ?approx, for instance) If you really care to know how NA's are accounted for, that's how you would go about doing it. Others are happy enough to know that they are more or less ignored and accounted for, and that's that. It is a good exercise to do for yourself, either way, as performing these exercises for several different "well travelled" packages is a great way to learn how to code in R, as well as tricks-of-the-trade related to programming/computing w/ data in general. Enjoy, -steve -- Steve Lianoglou Computational Biologist Genentech
0
5.2 years ago by
godahajime20
godahajime20 wrote:
Dr Lianoglou, I truly appreciate your kind response. It seems approx() is beyound my capacity, however, I will try and challenge myself to that. Professor Smyth, Phd Bolstad, I have treated FLAG spot as NA . However, supporsing that the intensities of FLAG spots are reliable to a certain degree, I might leave them intact. Thanks, [[alternative HTML version deleted]]
0
5.2 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:
The meaning of quantile normalization with NAs have never been agreed on in a refereed publication, as far as I know. I implemented the limma version long ago, and as far as I know it was the first implementation of quantile normalization to allow NAs. Ben Bolstad implemented a somewhat different algorithm in the affy package. Ben's version is now in the preprocessCore package as normalize.quantiles(). The result you have is correct according to limma's algorithm, which involves interpolating each column of non-missing values out a full length vector when computing the mean quantiles. The reason the NA makes a big difference is that it changes the minimum quantile for column 2 from 16.5 to 110, a big change. As an alternative, you might try Ben's algorithm: library(proprocessCore) normalize.quantiles(y) But replacing NAs with row medians would not in general be sufficient. Best wishes Gordon > Date: Tue, 21 Jan 2014 05:03:17 -0800 (PST) > From: H at mamba.fhcrc.org, "K [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, godahajime at zoho.com > Subject: [BioC] Question about quantile normalization and NA value > > > Dear all, > > I have a quation about quantile normalization and NA value. > > I'm going to normalize the microarray data by "normalizeBetweenArrays" which is the quantile normalization function in "limma" package. > I normalized a data with NA as follows: > >> x <- matrix(c(100,15,200,250,110,16.5,220,275,120,18,240,300),4,3) >> colnames(x) <- paste("Chip",1:3, sep="") >> rownames(x) <- c("RNA-A","RNA-B","RNA-C","RNA-D") >> >> x > Chip1 Chip2 Chip3 > RNA-A 100 110.0 120 > RNA-B 15 16.5 18 > RNA-C 200 220.0 240 > RNA-D 250 275.0 300 >> >> normalizeBetweenArrays(x) > Chip1 Chip2 Chip3 > RNA-A 110.0 110.0 110.0 > RNA-B 16.5 16.5 16.5 > RNA-C 220.0 220.0 220.0 > RNA-D 275.0 275.0 275.0 >> >> y <- x >> y[2,2] <- NA >> >> normalizeBetweenArrays(y) > Chip1 Chip2 Chip3 > RNA-A 134.44444 47.66667 134.44444 > RNA-B 47.66667 NA 47.66667 > RNA-C 226.11111 180.27778 226.11111 > RNA-D 275.00000 275.00000 275.00000 > > > I asuume the normalized y is a bit far away from normalized y. Does only one NA induce this large effect ? > Should I normalize after replacing NA with some value, such as median(x[2,],na.rm=T) ? > My environment is limma Version 3.16.6, R version 3.0.1. > > Thanks ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}