Question

Re: Limma: four suggestions related to weights and background correction

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 4 hours ago

WEHI, Melbourne, Australia

At 03:05 AM 24/01/2004, Ramon Diaz-Uriarte wrote: >Dear Gordon and other bioconductor-users, > >We have encountered a few recurrent problem when using limma. If I may, I'd >like to make four suggestions for changes which might benefit other people: > > >1. When "backgroundCorrect(temp1, method = "none")" is used columns for Rb >and >Gb are eliminated from the RGList object. We have found this problematic, >because, even if one does not want to do background subtraction, one might >still want to look at image plots of background, just to check nothing funny >has happened with the background. The problem is that the subsequent step of >using "normalizeWithinArrays" calls MA.RG, which itself uses the presence or >absence of the Rb and Gb to do (or not) the subtraction. Then, if we still >want to plot background and do the normalization, we either have to make sure >we are done with all background business before we do the normalization >(since after that all background info will be lost) or we need to keep the >normalized and the unnormalized objects around. None of these seem ideal. We >wonder if it would be possible to pass the type of subtraction to >"normalizeWithinArrays" and to "MA.RG" as a parameter, so that the original >object preservers all the information, but still we can do no background >subtraction. (We are hacking the code here, for our internal use). It is certainly a possibility that we will record the type of background correction in the future, probably simply as a character string corresponding to the 'method' argument of backgroundCorrect(). I don't want to store background intensities values as part of normalized data objects though. In this sense I think one should be done with background correction by the time you do the normalization. I am not really sure why you need to hack the code. Why not simply use something like MA <- normalizeWithinArrays(backgroundCorrect(RG,method="none")) In this way, you have normalized without background correcting, but at the same time the background values are preserved in the RGList object. >2. We have found it slightly confussing that the M value of a point with >weight 0 for the normalization is not an NA. Sure, the points with weight = 0 >do not get used to fit the loess curve, but they still have a residual. But, >so far, with the wet lab people we have worked, it has always been the case >that if a point was deemed unsuitable for the normalization it was also >unsuitable for further analyses (though, occasionally, points suitable for >fitting the normalization curve have been deemed unsuitable for further >analyses). > >Are we missing something obvious here? weights=0 are definitely not the same as missing values. Consider the following application for example: suppose you have a set of control spots and you want to normalize all the data to the loess curve going through the control spots. You can simply set the weights=1 for the control spots and zero otherwise to achieve this effect. Introducing NAs would destroy your data of interest. I am against introducing NAs as a general rule unless the data really is missing or is undefined in some intrinsic sense. >3. Occasionally, a complet print-tip group might be missing which will make >"normalizeWithinArrays" to fail. We have added the following to that >function, in the print-tip loess case, right before the line "spots <- spots >+ nspots" (and substituting the direct call to loessFit): > >****************************************** >num.fill <- length(object$M[spots, j]) >object$M[spots, j] <- rep(NA, num.fill) >no.objects <- > try(object$M[spots, j] > <- loessFit(y, x, w, span = span, > iterations = iterations)$residuals) >if(class(no.objects) == "try-error") > print(paste("WARNING: in print-tip loess normalization", > "no samples in array", j, "grid row", gridr, > "grid column", gridc)) >******************************************** That's a good suggestion. I would note though that if this phenomenon occurs for you, you probably shouldn't be using printtiploess normalization in the first place - you should use plain "loess" instead. >4. When all weights are either 0 or 1, there are slight differences in the >results of the normalization (of points with weight = 1) depending on whether >or not the cases with weight of 0 are passed to subsequent calls. Sure, these >differences are irrelevant, but it can be disconcerting. Wouldn't it be >appropriate to check for the binary situation of weights either 0 or 1, and >if so, exclude points with weight 0, which would allow us to call directly >".C("lowess"", instead of ".vsimpleLoess" (inside "loessFit")? There is no perfect solution really. The change that you suggest would mean that very small weights would become discontinuously different from zero weights. The real solution would be to have a lowess function which accepts weights and which interpolates based on x-value distances, but such a function isn't available. I am inclined to leave as is. Thanks for all your suggestions. Gordon >*************** >The following is an example. >(The data sets are at: >http://bioinfo.cnio.es/~rdiaz/00G66.txt) > > >temp01 <- read.maimages(files = "00G66.txt", > source = "genepix", > wt.fun=wtflags(0)) > >temp01.b <- temp01 >set.to.na <- which(temp01.b$weights == 0) >temp01.b$R[set.to.na] <- NA >temp01.b$G[set.to.na] <- NA > >temp01.c <- temp01.b >temp01.c$weights <- NULL > >n1c <- normalizeWithinArrays(temp01.c, layout = temp.layout, > iterations = 5, > method = "loess") >n1b <- normalizeWithinArrays(temp01.b, layout = temp.layout, > iterations = 5, > method = "loess") >n1 <- normalizeWithinArrays(temp01, layout = temp.layout, > iterations = 5, > method = "loess") > >pairs(cbind(n1$M, n1b$M, n1c$M)) >summary(n1$M - n1b$M) > > >****************** > >Best, > >R. > >-- >Ram?n D?az-Uriarte >Bioinformatics Unit >Centro Nacional de Investigaciones Oncol?gicas (CNIO) >(Spanish National Cancer Center) >Melchor Fern?ndez Almagro, 3 >28029 Madrid (Spain) >Fax: +-34-91-224-6972 >Phone: +-34-91-224-6900 > >http://bioinfo.cnio.es/~rdiaz >PGP KeyID: 0xE89B3462 >(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)

Normalization Cancer limma Normalization Cancer limma • 1.4k views

ADD COMMENT • link updated 21.9 years ago by Ramon Diaz ★ 1.1k • written 21.9 years ago by Gordon Smyth 53k

score 0 · Answer 1 · 2004-02-09

Dear Gordon, Thank you very much for your comments. A few answers: > > It is certainly a possibility that we will record the type of background > correction in the future, probably simply as a character string > corresponding to the 'method' argument of backgroundCorrect(). I don't want > to store background intensities values as part of normalized data objects > though. In this sense I think one should be done with background correction > by the time you do the normalization. > > I am not really sure why you need to hack the code. Why not simply use > something like > > MA <- normalizeWithinArrays(backgroundCorrect(RG,method="none")) > > In this way, you have normalized without background correcting, but at the > same time the background values are preserved in the RGList object. Yes but, because of the way we have structured our code (which we'll probably need to change), in addition to the normalization call, we also call MA.RG for some diagnostic plots of foreground and background intensities. Thus, we need to keep the backgroun data around, even if the non- transformed M is obtained without background subtraction (and plotted as such). I am not sure if this situation is common to other people, or if it is just an idiosyncrasy. > > >2. We have found it slightly confussing that the M value of a point with > >weight 0 for the normalization is not an NA. Sure, the points with weight > > = 0 do not get used to fit the loess curve, but they still have a > > residual. But, so far, with the wet lab people we have worked, it has > > always been the case that if a point was deemed unsuitable for the > > normalization it was also unsuitable for further analyses (though, > > occasionally, points suitable for fitting the normalization curve have > > been deemed unsuitable for further analyses). > > > >Are we missing something obvious here? > > weights=0 are definitely not the same as missing values. Consider the > following application for example: suppose you have a set of control spots > and you want to normalize all the data to the loess curve going through the > control spots. You can simply set the weights=1 for the control spots and > zero otherwise to achieve this effect. Introducing NAs would destroy your > data of interest. > > I am against introducing NAs as a general rule unless the data really is > missing or is undefined in some intrinsic sense. Thank you for your comments; I was completely overlooking that possibility (a "local effect" because we rarely have true, legitimate control spots in the arrays we often handle here). > > >3. Occasionally, a complet print-tip group might be missing which will > > make "normalizeWithinArrays" to fail. We have added the following to that > > function, in the print-tip loess case, right before the line "spots <- > > spots + nspots" (and substituting the direct call to loessFit): > > > >****************************************** > >num.fill <- length(object$M[spots, j]) > >object$M[spots, j] <- rep(NA, num.fill) > >no.objects <- > > try(object$M[spots, j] > > <- loessFit(y, x, w, span = span, > > iterations = iterations)$residuals) > >if(class(no.objects) == "try-error") > > print(paste("WARNING: in print-tip loess normalization", > > "no samples in array", j, "grid row", gridr, > > "grid column", gridc)) > >******************************************** > > That's a good suggestion. I would note though that if this phenomenon > occurs for you, you probably shouldn't be using printtiploess normalization > in the first place - you should use plain "loess" instead. I didn't make the context clear: we are using this as part of a web interface for normalization, and having the normalization functions to return an error is undesirable. With the above modification the R run finishes, and the web application is happy, but the user is given a warning, letting her choose to re-do things otherwise. However, regarding the type of normalization, I am not sure I see why, if this happens occasionally, the user should prefer global loess: if a print- tip group in a specific array has been damaged (e.g., a scratch), but the rest of the print-tip groups are OK, and the usual assumptions are believed to hold for each of the other print-tip groups, then I think print-tip loess is justified. > > >4. When all weights are either 0 or 1, there are slight differences in the > >results of the normalization (of points with weight = 1) depending on > > whether or not the cases with weight of 0 are passed to subsequent calls. > > Sure, these differences are irrelevant, but it can be disconcerting. > > Wouldn't it be appropriate to check for the binary situation of weights > > either 0 or 1, and if so, exclude points with weight 0, which would allow > > us to call directly ".C("lowess"", instead of ".vsimpleLoess" (inside > > "loessFit")? > > There is no perfect solution really. The change that you suggest would mean > that very small weights would become discontinuously different from zero > weights. The real solution would be to have a lowess function which accepts > weights and which interpolates based on x-value distances, but such a > function isn't available. I am inclined to leave as is. I see now that my suggestion was simplistic and naive. > > Thanks for all your suggestions. Thanks for all your comments (and, of course, for the limma package in the first place!) R. > Gordon > > >*************** > >The following is an example. > >(The data sets are at: > >http://bioinfo.cnio.es/~rdiaz/00G66.txt) > > > > > >temp01 <- read.maimages(files = "00G66.txt", > > source = "genepix", > > wt.fun=wtflags(0)) > > > >temp01.b <- temp01 > >set.to.na <- which(temp01.b$weights == 0) > >temp01.b$R[set.to.na] <- NA > >temp01.b$G[set.to.na] <- NA > > > >temp01.c <- temp01.b > >temp01.c$weights <- NULL > > > >n1c <- normalizeWithinArrays(temp01.c, layout = temp.layout, > > iterations = 5, > > method = "loess") > >n1b <- normalizeWithinArrays(temp01.b, layout = temp.layout, > > iterations = 5, > > method = "loess") > >n1 <- normalizeWithinArrays(temp01, layout = temp.layout, > > iterations = 5, > > method = "loess") > > > >pairs(cbind(n1$M, n1b$M, n1c$M)) > >summary(n1$M - n1b$M) > > > > > >****************** > > > >Best, > > > >R. > > > >-- > >Ram?n D?az-Uriarte > >Bioinformatics Unit > >Centro Nacional de Investigaciones Oncol?gicas (CNIO) > >(Spanish National Cancer Center) > >Melchor Fern?ndez Almagro, 3 > >28029 Madrid (Spain) > >Fax: +-34-91-224-6972 > >Phone: +-34-91-224-6900 > > > >http://bioinfo.cnio.es/~rdiaz > >PGP KeyID: 0xE89B3462 > >(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc) -- Ram?n D?az-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz PGP KeyID: 0xE89B3462 (http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)