summarizing probe intensites before or after normalization- 1. how to do with RMA 2. Opinions?
1
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States
Hi Karl, You have to convert the expression data to a data.frame in order to get a boxplot for each column. boxplot(data.frame(exprs(datrma))) I am shocked at the boxplots for rma(). I have never seen RMA processed data look that different, which makes me wonder what the raw data look like. Also, are you sure these are RMA processed data (e.g., you didn't accidently mask the rma() processed data with other data)? Unless the raw data are completely horrible, I don't know how you could get such disparate results. Best, Jim k. brand wrote: > James, > > Thank you for your fast detailed response. > > At your suggetion i tried your suugestion- ie., > > library(affyPLM) > dat <- ReadAffy() > datrma <- rma(dat, normalize=FALSE) > datrma <- normalize.quantiles(exprs(datrma)) > boxplot(datrma) > > Find attached the boxplot output "RMA then QnormJimscript.jpeg" which > indicates some missing syntax since there is only one box for the 12 > arrays?! In fact in my attempts to realize this endeavour i saw this > ouput too. Unfortunately my lack of R knowledge prevented me getting > around whatever the problem is. If you have a further suggestion im all > ears. > > "data", the result of inheriting my teachers bad habits, is now "dat"... > > Further i attach a boxplot (better than histogram right- "RMA vs Qunt > norm of MAS5 preproced & summed.jpg") comparing the two methods in my > orignal post. When i look at the variation of the RMA processed and > normed data i question how effectively can i compare these arrays with > each. Especially along side the method shown which is MAS5 preprocessed > then quantile normalized using a colleagues script. These arrays look > much more comparable, even ideally so. Why dont i just use this appraoch > you may ask? A: im convinced RMA is a superior preprocessing and > summarizing method...i just need to be able to reconcile the apparent > variation in the final output. Or perhaps, better understand it. > > Your further thoughts, suggests greatly appreciated, > > karl > > > > > > on 9/11/2006 3:42 PM James W. MacDonald said the following: > >> k. brand wrote: >> >>> Dear All, >>> >>> I compared two normalization approaches for an experiment using >>> twelve affy 430-2.0 chips. (histogram plot comparing bith methods >>> forwarded on request). >>> >>> #1. RMA >>> library(affy) >>> data <- ReadAffy() >>> datarma <- rma(data) >>> exprs2excel(datarma, file="dataRMA.csv") >>> >>> Plotting histograms of the output shows arrays NOT perfectly aligning >>> at the means and spreads. >>> >>> I used a custom script to effect a quantile normalization on MAS5 >>> preprocessed but unnormalized data- >>> >>> #2. Mas5 sans interchip normalization >>> library(affy) >>> data <- ReadAffy() >>> datamas5sannorm <- mas5(data, normalize=FALSE) >>> exprs2excel(datamas5sannorm, file="datamas5sannorm.csv") >>> f.qnorm <- function(x,qinit=0.75,perc=100) {... >>> >>> The means and spreads of this normalization approach do align perfectly. >>> >>> THUS- summarizing probe intensites before or after normalization does >>> appear to make a noticeable difference, as may be expected. >>> >>> My questions/requests- >>> >>> 1. Help to effect Bolstad normalization of the RMA preprocessed and >>> summarized data. Whilst I succeed in generating unnormalized RMA >>> preprocessed data with- >>> >>> library(affy) >>> data <- ReadAffy() >>> datarma <- rma(data, normalize=FALSE) >> >> >> Next step would be >> >> datarma <- normalize.quantiles(exprs(datarma)) >> >> also note that 'data' is not a very good variable name, as you are >> masking an existing function. When creating variable names it is often >> enlightening to type the name first at an R prompt to see if you get >> any response. >> >> >>> >>> As a result of my limited R experience, I failed in finding a method >>> to effect Bolstad (quantile) normalization on this output. >>> >>> 2. Thoughts/comments on the benefits/caveats of normalizing before or >>> after summarizing probe intensities. >> >> >> Normalizing after summarization for something like rma() seems >> questionable to me. Since the expression values are based on fitting a >> model to the PM probe values, if you don't normalize first you are >> ignoring any non-biological variability which may end up biasing your >> results. Using median polish for the model fit should help protect >> against this, but I don't know that I would want to take chances. >> >> As an aside, how far off are the histograms? Are you sure that there >> is a reasonable difference? Eyeballing a histogram isn't the best way >> to determine if the mean and variance are different or not. A quick >> run through with some data here shows very little differences: >> >> > eset <- justRMA(filenames=list.celfiles()[1:10]) >> > apply(exprs(eset),2,summary) >> Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 >> Min. 4.085 4.070 4.091 4.051 4.068 4.090 4.087 >> 1st Qu. 5.835 5.859 5.832 5.812 5.842 5.858 5.852 >> Median 7.079 7.069 7.048 7.061 7.070 7.077 7.080 >> Mean 7.225 7.227 7.224 7.227 7.229 7.225 7.232 >> 3rd Qu. 8.352 8.324 8.351 8.363 8.361 8.330 8.347 >> Max. 14.550 14.440 14.420 14.400 14.490 14.430 14.260 >> >> Best, >> >> Jim >> >> >>> >>> I look forward to any thoughts, advice & suggestions from users. >>> >>> thanks in advance, >>> >>> Karl >>> >>> >>> =========================================== >>> >>> > sessionInfo() >>> Version 2.3.0 (2006-04-24) >>> i386-pc-mingw32 >>> >>> attached base packages: >>> [1] "tools" "methods" "stats" "graphics" "grDevices" "utils" >>> "datasets" "base" >>> >>> other attached packages: >>> affy affyio Biobase >>> "1.10.0" "1.0.0" "1.10.0" >>> >> >> > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
Microarray Normalization Preprocessing Cancer probe affy convert affyio Microarray Cancer • 1.8k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States
Hi Karl, Please keep this on the list. Since you have repeated the experiment with new samples, and there is such a huge difference between the two sets, I would tend to separate them (e.g., run RMA separately on the two batches), and then fit a mixed effects model. You can use the GeneMetaEx package to do this. See this post for more info: http://article.gmane.org/gmane.science.biology.informatics.conductor/7 981/match=gentleman+mixed+model There may not be enough replication in each set to accurately estimate the variance/covariance matrix (somebody like Robert Gentleman would be better able to answer that question). Alternatively, if the old arrays are consistently 'dimmer' than the new by a relatively constant amount, you might just be able to fit a fixed effects ANOVA with a batch effect. An example of doing that can be found here, where they fit a 'day' effect: http://bioinf.wehi.edu.au/marray/jsm2005/lab5/lab5.html Best, Jim k. brand wrote: > Jim, > > Yes, i succeeded in normalizing the RMA data with: > > datrma.postqnorm <- normalize(datrma) > > Sorry to shock you...wait till you see the un-normalized data (attached- > "RMA no norm.jpeg"! > > Unfortunately the disparity is technical not biological. The low arrays > were 2 years past expiration, i know the quality is significantly > impaired. However good QPCR results justified my department allowing me > to use some fresh arrays to finish the experiment. > > So i have half my experiment (2 replicates for three tissues) on old > arrays, and the other half (2 replicates for three tissues) on fresh > arrays. The data is ok, i get some pathways at the end which look ok. > BUt i want to wring the most from this imperfect data set. > > And yes, it is definitely the correct RMA output. Normalizing after > summarizing DOES however enforce equal distributions for all arrays (see > attached "RMA then BB norm.jpeg"). Yet i still try to understand what is > the better approach, at least in my "disparate" situation- to normalize > before or after summarization. > > What do you think? > > Karl > > PS running another 2 arrays is not an option.... > > > on 9/11/2006 4:57 PM James W. MacDonald said the following: > >> Hi Karl, >> >> You have to convert the expression data to a data.frame in order to >> get a boxplot for each column. >> >> boxplot(data.frame(exprs(datrma))) >> >> I am shocked at the boxplots for rma(). I have never seen RMA >> processed data look that different, which makes me wonder what the raw >> data look like. Also, are you sure these are RMA processed data (e.g., >> you didn't accidently mask the rma() processed data with other data)? >> Unless the raw data are completely horrible, I don't know how you >> could get such disparate results. >> >> >> >> Best, >> >> Jim >> >> k. brand wrote: >> >>> James, >>> >>> Thank you for your fast detailed response. >>> >>> At your suggetion i tried your suugestion- ie., >>> >>> library(affyPLM) >>> dat <- ReadAffy() >>> datrma <- rma(dat, normalize=FALSE) >>> datrma <- normalize.quantiles(exprs(datrma)) >>> boxplot(datrma) >>> >>> Find attached the boxplot output "RMA then QnormJimscript.jpeg" which >>> indicates some missing syntax since there is only one box for the 12 >>> arrays?! In fact in my attempts to realize this endeavour i saw this >>> ouput too. Unfortunately my lack of R knowledge prevented me getting >>> around whatever the problem is. If you have a further suggestion im >>> all ears. >>> >>> "data", the result of inheriting my teachers bad habits, is now "dat"... >>> >>> Further i attach a boxplot (better than histogram right- "RMA vs Qunt >>> norm of MAS5 preproced & summed.jpg") comparing the two methods in my >>> orignal post. When i look at the variation of the RMA processed and >>> normed data i question how effectively can i compare these arrays >>> with each. Especially along side the method shown which is MAS5 >>> preprocessed then quantile normalized using a colleagues script. >>> These arrays look much more comparable, even ideally so. Why dont i >>> just use this appraoch you may ask? A: im convinced RMA is a superior >>> preprocessing and summarizing method...i just need to be able to >>> reconcile the apparent variation in the final output. Or perhaps, >>> better understand it. >>> >>> Your further thoughts, suggests greatly appreciated, >>> >>> karl >>> >>> >>> >>> >>> >>> on 9/11/2006 3:42 PM James W. MacDonald said the following: >>> >>>> k. brand wrote: >>>> >>>>> Dear All, >>>>> >>>>> I compared two normalization approaches for an experiment using >>>>> twelve affy 430-2.0 chips. (histogram plot comparing bith methods >>>>> forwarded on request). >>>>> >>>>> #1. RMA >>>>> library(affy) >>>>> data <- ReadAffy() >>>>> datarma <- rma(data) >>>>> exprs2excel(datarma, file="dataRMA.csv") >>>>> >>>>> Plotting histograms of the output shows arrays NOT perfectly >>>>> aligning at the means and spreads. >>>>> >>>>> I used a custom script to effect a quantile normalization on MAS5 >>>>> preprocessed but unnormalized data- >>>>> >>>>> #2. Mas5 sans interchip normalization >>>>> library(affy) >>>>> data <- ReadAffy() >>>>> datamas5sannorm <- mas5(data, normalize=FALSE) >>>>> exprs2excel(datamas5sannorm, file="datamas5sannorm.csv") >>>>> f.qnorm <- function(x,qinit=0.75,perc=100) {... >>>>> >>>>> The means and spreads of this normalization approach do align >>>>> perfectly. >>>>> >>>>> THUS- summarizing probe intensites before or after normalization >>>>> does appear to make a noticeable difference, as may be expected. >>>>> >>>>> My questions/requests- >>>>> >>>>> 1. Help to effect Bolstad normalization of the RMA preprocessed and >>>>> summarized data. Whilst I succeed in generating unnormalized RMA >>>>> preprocessed data with- >>>>> >>>>> library(affy) >>>>> data <- ReadAffy() >>>>> datarma <- rma(data, normalize=FALSE) >>>> >>>> >>>> >>>> Next step would be >>>> >>>> datarma <- normalize.quantiles(exprs(datarma)) >>>> >>>> also note that 'data' is not a very good variable name, as you are >>>> masking an existing function. When creating variable names it is >>>> often enlightening to type the name first at an R prompt to see if >>>> you get any response. >>>> >>>> >>>>> >>>>> As a result of my limited R experience, I failed in finding a >>>>> method to effect Bolstad (quantile) normalization on this output. >>>>> >>>>> 2. Thoughts/comments on the benefits/caveats of normalizing before >>>>> or after summarizing probe intensities. >>>> >>>> >>>> >>>> Normalizing after summarization for something like rma() seems >>>> questionable to me. Since the expression values are based on fitting >>>> a model to the PM probe values, if you don't normalize first you are >>>> ignoring any non-biological variability which may end up biasing >>>> your results. Using median polish for the model fit should help >>>> protect against this, but I don't know that I would want to take >>>> chances. >>>> >>>> As an aside, how far off are the histograms? Are you sure that there >>>> is a reasonable difference? Eyeballing a histogram isn't the best >>>> way to determine if the mean and variance are different or not. A >>>> quick run through with some data here shows very little differences: >>>> >>>> > eset <- justRMA(filenames=list.celfiles()[1:10]) >>>> > apply(exprs(eset),2,summary) >>>> Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 >>>> Min. 4.085 4.070 4.091 4.051 4.068 4.090 4.087 >>>> 1st Qu. 5.835 5.859 5.832 5.812 5.842 5.858 5.852 >>>> Median 7.079 7.069 7.048 7.061 7.070 7.077 7.080 >>>> Mean 7.225 7.227 7.224 7.227 7.229 7.225 7.232 >>>> 3rd Qu. 8.352 8.324 8.351 8.363 8.361 8.330 8.347 >>>> Max. 14.550 14.440 14.420 14.400 14.490 14.430 14.260 >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>>> >>>>> I look forward to any thoughts, advice & suggestions from users. >>>>> >>>>> thanks in advance, >>>>> >>>>> Karl >>>>> >>>>> >>>>> =========================================== >>>>> >>>>> > sessionInfo() >>>>> Version 2.3.0 (2006-04-24) >>>>> i386-pc-mingw32 >>>>> >>>>> attached base packages: >>>>> [1] "tools" "methods" "stats" "graphics" "grDevices" >>>>> "utils" >>>>> "datasets" "base" >>>>> >>>>> other attached packages: >>>>> affy affyio Biobase >>>>> "1.10.0" "1.0.0" "1.10.0" >>>>> >>>> >>>> >>> >> >> > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.
ADD COMMENT
0
Entering edit mode
Thank you Jim, Given the QPCR results, i believe the differences are consistent; so i will definitely follow up your suggestions. Apologies for 'falling off' the list, the jpeg attachments kept bouncing. One way or another will stick to it in future. Again, thank you for your thoughts, Karl on 9/11/2006 5:37 PM James W. MacDonald said the following: > Hi Karl, > > Please keep this on the list. > > Since you have repeated the experiment with new samples, and there is > such a huge difference between the two sets, I would tend to separate > them (e.g., run RMA separately on the two batches), and then fit a mixed > effects model. You can use the GeneMetaEx package to do this. See this > post for more info: > > http://article.gmane.org/gmane.science.biology.informatics.conductor /7981/match=gentleman+mixed+model > > > There may not be enough replication in each set to accurately estimate > the variance/covariance matrix (somebody like Robert Gentleman would be > better able to answer that question). > > Alternatively, if the old arrays are consistently 'dimmer' than the new > by a relatively constant amount, you might just be able to fit a fixed > effects ANOVA with a batch effect. An example of doing that can be found > here, where they fit a 'day' effect: > > http://bioinf.wehi.edu.au/marray/jsm2005/lab5/lab5.html > > Best, > > Jim > > k. brand wrote: >> Jim, >> >> Yes, i succeeded in normalizing the RMA data with: >> >> datrma.postqnorm <- normalize(datrma) >> >> Sorry to shock you...wait till you see the un-normalized data >> (attached- "RMA no norm.jpeg"! >> >> Unfortunately the disparity is technical not biological. The low >> arrays were 2 years past expiration, i know the quality is >> significantly impaired. However good QPCR results justified my >> department allowing me to use some fresh arrays to finish the experiment. >> >> So i have half my experiment (2 replicates for three tissues) on old >> arrays, and the other half (2 replicates for three tissues) on fresh >> arrays. The data is ok, i get some pathways at the end which look ok. >> BUt i want to wring the most from this imperfect data set. >> >> And yes, it is definitely the correct RMA output. Normalizing after >> summarizing DOES however enforce equal distributions for all arrays >> (see attached "RMA then BB norm.jpeg"). Yet i still try to understand >> what is the better approach, at least in my "disparate" situation- to >> normalize before or after summarization. >> >> What do you think? >> >> Karl >> >> PS running another 2 arrays is not an option.... >> >> >> on 9/11/2006 4:57 PM James W. MacDonald said the following: >> >>> Hi Karl, >>> >>> You have to convert the expression data to a data.frame in order to >>> get a boxplot for each column. >>> >>> boxplot(data.frame(exprs(datrma))) >>> >>> I am shocked at the boxplots for rma(). I have never seen RMA >>> processed data look that different, which makes me wonder what the >>> raw data look like. Also, are you sure these are RMA processed data >>> (e.g., you didn't accidently mask the rma() processed data with other >>> data)? Unless the raw data are completely horrible, I don't know how >>> you could get such disparate results. >>> >>> >>> >>> Best, >>> >>> Jim >>> >>> k. brand wrote: >>> >>>> James, >>>> >>>> Thank you for your fast detailed response. >>>> >>>> At your suggetion i tried your suugestion- ie., >>>> >>>> library(affyPLM) >>>> dat <- ReadAffy() >>>> datrma <- rma(dat, normalize=FALSE) >>>> datrma <- normalize.quantiles(exprs(datrma)) >>>> boxplot(datrma) >>>> >>>> Find attached the boxplot output "RMA then QnormJimscript.jpeg" >>>> which indicates some missing syntax since there is only one box for >>>> the 12 arrays?! In fact in my attempts to realize this endeavour i >>>> saw this ouput too. Unfortunately my lack of R knowledge prevented >>>> me getting around whatever the problem is. If you have a further >>>> suggestion im all ears. >>>> >>>> "data", the result of inheriting my teachers bad habits, is now >>>> "dat"... >>>> >>>> Further i attach a boxplot (better than histogram right- "RMA vs >>>> Qunt norm of MAS5 preproced & summed.jpg") comparing the two methods >>>> in my orignal post. When i look at the variation of the RMA >>>> processed and normed data i question how effectively can i compare >>>> these arrays with each. Especially along side the method shown which >>>> is MAS5 preprocessed then quantile normalized using a colleagues >>>> script. These arrays look much more comparable, even ideally so. Why >>>> dont i just use this appraoch you may ask? A: im convinced RMA is a >>>> superior preprocessing and summarizing method...i just need to be >>>> able to reconcile the apparent variation in the final output. Or >>>> perhaps, better understand it. >>>> >>>> Your further thoughts, suggests greatly appreciated, >>>> >>>> karl >>>> >>>> >>>> >>>> >>>> >>>> on 9/11/2006 3:42 PM James W. MacDonald said the following: >>>> >>>>> k. brand wrote: >>>>> >>>>>> Dear All, >>>>>> >>>>>> I compared two normalization approaches for an experiment using >>>>>> twelve affy 430-2.0 chips. (histogram plot comparing bith methods >>>>>> forwarded on request). >>>>>> >>>>>> #1. RMA >>>>>> library(affy) >>>>>> data <- ReadAffy() >>>>>> datarma <- rma(data) >>>>>> exprs2excel(datarma, file="dataRMA.csv") >>>>>> >>>>>> Plotting histograms of the output shows arrays NOT perfectly >>>>>> aligning at the means and spreads. >>>>>> >>>>>> I used a custom script to effect a quantile normalization on MAS5 >>>>>> preprocessed but unnormalized data- >>>>>> >>>>>> #2. Mas5 sans interchip normalization >>>>>> library(affy) >>>>>> data <- ReadAffy() >>>>>> datamas5sannorm <- mas5(data, normalize=FALSE) >>>>>> exprs2excel(datamas5sannorm, file="datamas5sannorm.csv") >>>>>> f.qnorm <- function(x,qinit=0.75,perc=100) {... >>>>>> >>>>>> The means and spreads of this normalization approach do align >>>>>> perfectly. >>>>>> >>>>>> THUS- summarizing probe intensites before or after normalization >>>>>> does appear to make a noticeable difference, as may be expected. >>>>>> >>>>>> My questions/requests- >>>>>> >>>>>> 1. Help to effect Bolstad normalization of the RMA preprocessed >>>>>> and summarized data. Whilst I succeed in generating unnormalized >>>>>> RMA preprocessed data with- >>>>>> >>>>>> library(affy) >>>>>> data <- ReadAffy() >>>>>> datarma <- rma(data, normalize=FALSE) >>>>> >>>>> >>>>> >>>>> Next step would be >>>>> >>>>> datarma <- normalize.quantiles(exprs(datarma)) >>>>> >>>>> also note that 'data' is not a very good variable name, as you are >>>>> masking an existing function. When creating variable names it is >>>>> often enlightening to type the name first at an R prompt to see if >>>>> you get any response. >>>>> >>>>> >>>>>> >>>>>> As a result of my limited R experience, I failed in finding a >>>>>> method to effect Bolstad (quantile) normalization on this output. >>>>>> >>>>>> 2. Thoughts/comments on the benefits/caveats of normalizing before >>>>>> or after summarizing probe intensities. >>>>> >>>>> >>>>> >>>>> Normalizing after summarization for something like rma() seems >>>>> questionable to me. Since the expression values are based on >>>>> fitting a model to the PM probe values, if you don't normalize >>>>> first you are ignoring any non-biological variability which may end >>>>> up biasing your results. Using median polish for the model fit >>>>> should help protect against this, but I don't know that I would >>>>> want to take chances. >>>>> >>>>> As an aside, how far off are the histograms? Are you sure that >>>>> there is a reasonable difference? Eyeballing a histogram isn't the >>>>> best way to determine if the mean and variance are different or >>>>> not. A quick run through with some data here shows very little >>>>> differences: >>>>> >>>>> > eset <- justRMA(filenames=list.celfiles()[1:10]) >>>>> > apply(exprs(eset),2,summary) >>>>> Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 Sample 7 >>>>> Min. 4.085 4.070 4.091 4.051 4.068 4.090 4.087 >>>>> 1st Qu. 5.835 5.859 5.832 5.812 5.842 5.858 5.852 >>>>> Median 7.079 7.069 7.048 7.061 7.070 7.077 7.080 >>>>> Mean 7.225 7.227 7.224 7.227 7.229 7.225 7.232 >>>>> 3rd Qu. 8.352 8.324 8.351 8.363 8.361 8.330 8.347 >>>>> Max. 14.550 14.440 14.420 14.400 14.490 14.430 14.260 >>>>> >>>>> Best, >>>>> >>>>> Jim >>>>> >>>>> >>>>>> >>>>>> I look forward to any thoughts, advice & suggestions from users. >>>>>> >>>>>> thanks in advance, >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> =========================================== >>>>>> >>>>>> > sessionInfo() >>>>>> Version 2.3.0 (2006-04-24) >>>>>> i386-pc-mingw32 >>>>>> >>>>>> attached base packages: >>>>>> [1] "tools" "methods" "stats" "graphics" "grDevices" >>>>>> "utils" >>>>>> "datasets" "base" >>>>>> >>>>>> other attached packages: >>>>>> affy affyio Biobase >>>>>> "1.10.0" "1.0.0" "1.10.0" >>>>>> >>>>> >>>>> >>>> >>> >>> >> > > -- Karl Brand <k.brand at="" erasmusmc.nl=""> Department of Cell Biology and Genetics Erasmus MC Dr Molewaterplein 50 3015 GE Rotterdam lab +31 (0)10 408 7409 fax +31 (0)10 408 9468
ADD REPLY

Login before adding your answer.

Traffic: 474 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6