Problem with gcrma

0

Entering edit mode

Casper Shyr ▴ 140

@casper-shyr-4113

Last seen 9.7 years ago

Hello all, I am having trouble normalizing with gcrma package. I start off with 4 Affy chip CEL files. I read them in, and then performed gcrma with default parameter. Then I made a boxplot of the normalized data. The result of the plot is tail underneath the 1st quartile is either very short or completely missing. The median for each assay, although lined up, is very close to the 1st quartile. Judging from the nature of the data, I know this is wrong. I also tested on the example Dilution data, and got the similar result as well (i.e. lower tail absent). My code is simply:DataWTPBS <- ReadAffy(celfile.path="data/WTPBS/"); WTPBSgcRMA <- gcrma(DataWTPBS); boxplot(exprs(WTPBSgcRMA)); I've checked if any of my packages need to be updated. I also made the boxplot of unnormalized data and it looked fine. Any suggestion on why this might be the case? Thank you!Sincerely,Casper University of British Columbia _________________________________________________________________ [[alternative HTML version deleted]]

affy gcrma affy gcrma • 907 views

ADD COMMENT • link updated 14.0 years ago by Paul Williams ▴ 10 • written 14.0 years ago by Casper Shyr ▴ 140

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 4 days ago

United States

Hi Casper. Casper Shyr wrote: > Hello all, I am having trouble normalizing with gcrma package. I start off with 4 Affy chip CEL files. I read them in, and then performed gcrma with default parameter. Then I made a boxplot of the normalized data. The result of the plot is tail underneath the 1st quartile is either very short or completely missing. The median for each assay, although lined up, is very close to the 1st quartile. Judging from the nature of the data, I know this is wrong. I also tested on the example Dilution data, and got the similar result as well (i.e. lower tail absent). > My code is simply:DataWTPBS <- > ReadAffy(celfile.path="data/WTPBS/"); > WTPBSgcRMA <- > gcrma(DataWTPBS); > boxplot(exprs(WTPBSgcRMA)); I've checked if any of my packages need to be updated. I also made the boxplot of unnormalized data and it looked fine. > Any suggestion on why this might be the case? This isn't a problem with gcrma(); in fact it is expected. What both RMA and GCRMA are trying to do is subtract background from the data without unduly affecting the data from truly expressed genes. Rather than using boxplots, it might be instructive for you to look at density plots. As an example using the Dilution data set: library(gcrma) library(affydata) data(Dilution) ## ExpressionSet without background correction norm <- normalize(Dilution, "quantiles") eset.nobg <- computeExprSet(norm, summary.method = "medianpolish", pmcorrect.method = "pmonly") eset.rma <- rma(Dilution) eset.gcrma <- gcrma(Dilution) plot(density(exprs(eset.nobg)[,1]), xlim = c(1,14)) lines(density(exprs(eset.rma)[,1]), lty=2) lines(density(exprs(eset.gcrma)[,1]), lty=3) You can see here that the data without background correction is pretty much a single peak, where it is difficult to distinguish truly expressed data from background. The RMA data (dashed line) looks semi-bimodal (and usually looks better than this), with some differentiation between background and expressed data. The GCRMA data has a clear separation between data that are assumed to be from unexpressed genes, and data from expressed genes. Best, Jim > Thank you!Sincerely,Casper > University of British Columbia > _________________________________________________________________ > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 14.0 years ago James W. MacDonald 65k

0

Entering edit mode

Paul Williams ▴ 10

@paul-williams-4117

Last seen 9.7 years ago

James, very nice example. I'd like to relate a caveat from my experience about using density plots of gcrma-processed data. The default behavior for gcrma() or justGCRMA() is to use a fast ad hoc algorithm for background adjustment (running args(gcrma) yields a bunch of arguments, but the argument fast is set to TRUE by default). I have found that density plots of data processed using this ad-hoc method can be somewhat misleading. If you run: x <- hist(exprs(eset.gcrma)[,1],breaks=seq(0,14,by=0.01)) plot(x$mids,x$freq,log='y') you can see that the background peak in the density plot is not a smooth Gaussian but something that looks more like an exponential, with the smallest value having the highest frequency, with frequency decreasing as the expression value increases. Interestingly, when you run gcrma or justGCRMA with fast=FALSE (using the empirical Bayesian background adjustment), the density plot more closely matches the actual distribution for the background peak. Incidentally, with regards to Casper's question, boxplots of the expression values get a lower tail again. However, I don't know if there's consensus on which algorithm is more desirable, all other things being equal. Paul Williams, Ph. D. Bioinformatics Scientist Compendia Bioscience, Inc. 110 Miller Ave. Ann Arbor, MI 48104 USA pwilliams@compendiabio.com http://www.compendiabio.com [[alternative HTML version deleted]]

ADD COMMENT • link 14.0 years ago Paul Williams ▴ 10

Login before adding your answer.