Question

summarizing probe intensites before or after normalization- 1. how to do with RMA 2. Opinions?

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

Hi Karl, You have to convert the expression data to a data.frame in order to get a boxplot for each column. boxplot(data.frame(exprs(datrma))) I am shocked at the boxplots for rma(). I have never seen RMA processed data look that different, which makes me wonder what the raw data look like. Also, are you sure these are RMA processed data (e.g., you didn't accidently mask the rma() processed data with other data)? Unless the raw data are completely horrible, I don't know how you could get such disparate results. Best, Jim k. brand wrote: > James, > > Thank you for your fast detailed response. > > At your suggetion i tried your suugestion- ie., > > library(affyPLM) > dat <- ReadAffy() > datrma <- rma(dat, normalize=FALSE) > datrma <- normalize.quantiles(exprs(datrma)) > boxplot(datrma) > > Find attached the boxplot output "RMA then QnormJimscript.jpeg" which > indicates some missing syntax since there is only one box for the 12 > arrays?! In fact in my attempts to realize this endeavour i saw this > ouput too. Unfortunately my lack of R knowledge prevented me getting > around whatever the problem is. If you have a further suggestion im all > ears. > > "data", the result of inheriting my teachers bad habits, is now "dat"... > > Further i attach a boxplot (better than histogram right- "RMA vs Qunt > norm of MAS5 preproced & summed.jpg") comparing the two methods in my > orignal post. When i look at the variation of the RMA processed and > normed data i question how effectively can i compare these arrays with > each. Especially along side the method shown which is MAS5 preprocessed > then quantile normalized using a colleagues script. These arrays look > much more comparable, even ideally so. Why dont i just use this appraoch > you may ask? A: im convinced RMA is a superior preprocessing and > summarizing method...i just need to be able to reconcile the apparent > variation in the final output. Or perhaps, better understand it. > > Your further thoughts, suggests greatly appreciated, > > karl > > > > > > on 9/11/2006 3:42 PM James W. MacDonald said the following: > >> k. brand wrote: >> >>> Dear All, >>> >>> I compared two normalization approaches for an experiment using >>> twelve affy 430-2.0 chips. (histogram plot comparing bith methods >>> forwarded on request). >>> >>> #1. RMA >>> library(affy) >>> data <- ReadAffy() >>> datarma <- rma(data) >>> exprs2excel(datarma, file="dataRMA.csv") >>> >>> Plotting histograms of the output shows arrays NOT perfectly aligning >>> at the means and spreads. >>> >>> I used a custom script to effect a quantile normalization on MAS5 >>> preprocessed but unnormalized data- >>> >>> #2. Mas5 sans interchip normalization >>> library(affy) >>> data <- ReadAffy() >>> datamas5sannorm <- mas5(data, normalize=FALSE) >>> exprs2excel(datamas5sannorm, file="datamas5sannorm.csv") >>> f.qnorm <- function(x,qinit=0.75,perc=100) {... >>> >>> The means and spreads of this normalization approach do align perfectly. >>> >>> THUS- summarizing probe intensites before or after normalization does >>> appear to make a noticeable difference, as may be expected. >>> >>> My questions/requests- >>> >>> 1. Help to effect Bolstad normalization of the RMA preprocessed and >>> summarized data. Whilst I succeed in generating unnormalized RMA >>> preprocessed data with- >>> >>> library(affy) >>> data <- ReadAffy() >>> datarma <- rma(data, normalize=FALSE) >> >> >> Next step would be >> >> datarma <- normalize.quantiles(exprs(datarma)) >> >> also note that 'data' is not a very good variable name, as you are >> masking an existing function. When creating variable names it is often >> enlightening to type the name first at an R prompt to see if you get >> any response. >> >> >>> >>> As a result of my limited R experience, I failed in finding a method >>> to effect Bolstad (quantile) normalization on this output. >>> >>> 2. Thoughts/comments on the benefits/caveats of normalizing before or >>> after summarizing probe intensities. >> >> >> Normalizing after summarization for something like rma() seems >> questionable to me. Since the expression values are based on fitting a >> model to the PM probe values, if you don't normalize first you are >> ignoring any non-biological variability which may end up biasing your >> results. Using median polish for the model fit should help protect >> against this, but I don't know that I would want to take chances. >> >> As an aside, how far off are the histograms? Are you sure that there >> is a reasonable difference? Eyeballing a histogram isn't the best way >> to determine if the mean and variance are different or not. A quick >> run through with some data here shows very little differences: >> >> > eset <- justRMA(filenames=list.celfiles()[1:10]) >> > apply(exprs(eset),2,summary) >> Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 >> Min. 4.085 4.070 4.091 4.051 4.068 4.090 4.087 >> 1st Qu. 5.835 5.859 5.832 5.812 5.842 5.858 5.852 >> Median 7.079 7.069 7.048 7.061 7.070 7.077 7.080 >> Mean 7.225 7.227 7.224 7.227 7.229 7.225 7.232 >> 3rd Qu. 8.352 8.324 8.351 8.363 8.361 8.330 8.347 >> Max. 14.550 14.440 14.420 14.400 14.490 14.430 14.260 >> >> Best, >> >> Jim >> >> >>> >>> I look forward to any thoughts, advice & suggestions from users. >>> >>> thanks in advance, >>> >>> Karl >>> >>> >>> =========================================== >>> >>> > sessionInfo() >>> Version 2.3.0 (2006-04-24) >>> i386-pc-mingw32 >>> >>> attached base packages: >>> [1] "tools" "methods" "stats" "graphics" "grDevices" "utils" >>> "datasets" "base" >>> >>> other attached packages: >>> affy affyio Biobase >>> "1.10.0" "1.0.0" "1.10.0" >>> >> >> > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

Microarray Normalization Preprocessing Cancer probe affy convert affyio Microarray Cancer • 2.2k views

ADD COMMENT • link 19.4 years ago James W. MacDonald 68k

score 0 · Answer 1 · 2006-09-11

Hi Karl, Please keep this on the list. Since you have repeated the experiment with new samples, and there is such a huge difference between the two sets, I would tend to separate them (e.g., run RMA separately on the two batches), and then fit a mixed effects model. You can use the GeneMetaEx package to do this. See this post for more info: http://article.gmane.org/gmane.science.biology.informatics.conductor/7 981/match=gentleman+mixed+model There may not be enough replication in each set to accurately estimate the variance/covariance matrix (somebody like Robert Gentleman would be better able to answer that question). Alternatively, if the old arrays are consistently 'dimmer' than the new by a relatively constant amount, you might just be able to fit a fixed effects ANOVA with a batch effect. An example of doing that can be found here, where they fit a 'day' effect: http://bioinf.wehi.edu.au/marray/jsm2005/lab5/lab5.html Best, Jim k. brand wrote: > Jim, > > Yes, i succeeded in normalizing the RMA data with: > > datrma.postqnorm <- normalize(datrma) > > Sorry to shock you...wait till you see the un-normalized data (attached- > "RMA no norm.jpeg"! > > Unfortunately the disparity is technical not biological. The low arrays > were 2 years past expiration, i know the quality is significantly > impaired. However good QPCR results justified my department allowing me > to use some fresh arrays to finish the experiment. > > So i have half my experiment (2 replicates for three tissues) on old > arrays, and the other half (2 replicates for three tissues) on fresh > arrays. The data is ok, i get some pathways at the end which look ok. > BUt i want to wring the most from this imperfect data set. > > And yes, it is definitely the correct RMA output. Normalizing after > summarizing DOES however enforce equal distributions for all arrays (see > attached "RMA then BB norm.jpeg"). Yet i still try to understand what is > the better approach, at least in my "disparate" situation- to normalize > before or after summarization. > > What do you think? > > Karl > > PS running another 2 arrays is not an option.... > > > on 9/11/2006 4:57 PM James W. MacDonald said the following: > >> Hi Karl, >> >> You have to convert the expression data to a data.frame in order to >> get a boxplot for each column. >> >> boxplot(data.frame(exprs(datrma))) >> >> I am shocked at the boxplots for rma(). I have never seen RMA >> processed data look that different, which makes me wonder what the raw >> data look like. Also, are you sure these are RMA processed data (e.g., >> you didn't accidently mask the rma() processed data with other data)? >> Unless the raw data are completely horrible, I don't know how you >> could get such disparate results. >> >> >> >> Best, >> >> Jim >> >> k. brand wrote: >> >>> James, >>> >>> Thank you for your fast detailed response. >>> >>> At your suggetion i tried your suugestion- ie., >>> >>> library(affyPLM) >>> dat <- ReadAffy() >>> datrma <- rma(dat, normalize=FALSE) >>> datrma <- normalize.quantiles(exprs(datrma)) >>> boxplot(datrma) >>> >>> Find attached the boxplot output "RMA then QnormJimscript.jpeg" which >>> indicates some missing syntax since there is only one box for the 12 >>> arrays?! In fact in my attempts to realize this endeavour i saw this >>> ouput too. Unfortunately my lack of R knowledge prevented me getting >>> around whatever the problem is. If you have a further suggestion im >>> all ears. >>> >>> "data", the result of inheriting my teachers bad habits, is now "dat"... >>> >>> Further i attach a boxplot (better than histogram right- "RMA vs Qunt >>> norm of MAS5 preproced & summed.jpg") comparing the two methods in my >>> orignal post. When i look at the variation of the RMA processed and >>> normed data i question how effectively can i compare these arrays >>> with each. Especially along side the method shown which is MAS5 >>> preprocessed then quantile normalized using a colleagues script. >>> These arrays look much more comparable, even ideally so. Why dont i >>> just use this appraoch you may ask? A: im convinced RMA is a superior >>> preprocessing and summarizing method...i just need to be able to >>> reconcile the apparent variation in the final output. Or perhaps, >>> better understand it. >>> >>> Your further thoughts, suggests greatly appreciated, >>> >>> karl >>> >>> >>> >>> >>> >>> on 9/11/2006 3:42 PM James W. MacDonald said the following: >>> >>>> k. brand wrote: >>>> >>>>> Dear All, >>>>> >>>>> I compared two normalization approaches for an experiment using >>>>> twelve affy 430-2.0 chips. (histogram plot comparing bith methods >>>>> forwarded on request). >>>>> >>>>> #1. RMA >>>>> library(affy) >>>>> data <- ReadAffy() >>>>> datarma <- rma(data) >>>>> exprs2excel(datarma, file="dataRMA.csv") >>>>> >>>>> Plotting histograms of the output shows arrays NOT perfectly >>>>> aligning at the means and spreads. >>>>> >>>>> I used a custom script to effect a quantile normalization on MAS5 >>>>> preprocessed but unnormalized data- >>>>> >>>>> #2. Mas5 sans interchip normalization >>>>> library(affy) >>>>> data <- ReadAffy() >>>>> datamas5sannorm <- mas5(data, normalize=FALSE) >>>>> exprs2excel(datamas5sannorm, file="datamas5sannorm.csv") >>>>> f.qnorm <- function(x,qinit=0.75,perc=100) {... >>>>> >>>>> The means and spreads of this normalization approach do align >>>>> perfectly. >>>>> >>>>> THUS- summarizing probe intensites before or after normalization >>>>> does appear to make a noticeable difference, as may be expected. >>>>> >>>>> My questions/requests- >>>>> >>>>> 1. Help to effect Bolstad normalization of the RMA preprocessed and >>>>> summarized data. Whilst I succeed in generating unnormalized RMA >>>>> preprocessed data with- >>>>> >>>>> library(affy) >>>>> data <- ReadAffy() >>>>> datarma <- rma(data, normalize=FALSE) >>>> >>>> >>>> >>>> Next step would be >>>> >>>> datarma <- normalize.quantiles(exprs(datarma)) >>>> >>>> also note that 'data' is not a very good variable name, as you are >>>> masking an existing function. When creating variable names it is >>>> often enlightening to type the name first at an R prompt to see if >>>> you get any response. >>>> >>>> >>>>> >>>>> As a result of my limited R experience, I failed in finding a >>>>> method to effect Bolstad (quantile) normalization on this output. >>>>> >>>>> 2. Thoughts/comments on the benefits/caveats of normalizing before >>>>> or after summarizing probe intensities. >>>> >>>> >>>> >>>> Normalizing after summarization for something like rma() seems >>>> questionable to me. Since the expression values are based on fitting >>>> a model to the PM probe values, if you don't normalize first you are >>>> ignoring any non-biological variability which may end up biasing >>>> your results. Using median polish for the model fit should help >>>> protect against this, but I don't know that I would want to take >>>> chances. >>>> >>>> As an aside, how far off are the histograms? Are you sure that there >>>> is a reasonable difference? Eyeballing a histogram isn't the best >>>> way to determine if the mean and variance are different or not. A >>>> quick run through with some data here shows very little differences: >>>> >>>> > eset <- justRMA(filenames=list.celfiles()[1:10]) >>>> > apply(exprs(eset),2,summary) >>>> Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 >>>> Min. 4.085 4.070 4.091 4.051 4.068 4.090 4.087 >>>> 1st Qu. 5.835 5.859 5.832 5.812 5.842 5.858 5.852 >>>> Median 7.079 7.069 7.048 7.061 7.070 7.077 7.080 >>>> Mean 7.225 7.227 7.224 7.227 7.229 7.225 7.232 >>>> 3rd Qu. 8.352 8.324 8.351 8.363 8.361 8.330 8.347 >>>> Max. 14.550 14.440 14.420 14.400 14.490 14.430 14.260 >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>>> >>>>> I look forward to any thoughts, advice & suggestions from users. >>>>> >>>>> thanks in advance, >>>>> >>>>> Karl >>>>> >>>>> >>>>> =========================================== >>>>> >>>>> > sessionInfo() >>>>> Version 2.3.0 (2006-04-24) >>>>> i386-pc-mingw32 >>>>> >>>>> attached base packages: >>>>> [1] "tools" "methods" "stats" "graphics" "grDevices" >>>>> "utils" >>>>> "datasets" "base" >>>>> >>>>> other attached packages: >>>>> affy affyio Biobase >>>>> "1.10.0" "1.0.0" "1.10.0" >>>>> >>>> >>>> >>> >> >> > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.