Entering edit mode
Hi Karl,
You have to convert the expression data to a data.frame in order to
get
a boxplot for each column.
boxplot(data.frame(exprs(datrma)))
I am shocked at the boxplots for rma(). I have never seen RMA
processed
data look that different, which makes me wonder what the raw data look
like. Also, are you sure these are RMA processed data (e.g., you
didn't
accidently mask the rma() processed data with other data)? Unless the
raw data are completely horrible, I don't know how you could get such
disparate results.
Best,
Jim
k. brand wrote:
> James,
>
> Thank you for your fast detailed response.
>
> At your suggetion i tried your suugestion- ie.,
>
> library(affyPLM)
> dat <- ReadAffy()
> datrma <- rma(dat, normalize=FALSE)
> datrma <- normalize.quantiles(exprs(datrma))
> boxplot(datrma)
>
> Find attached the boxplot output "RMA then QnormJimscript.jpeg"
which
> indicates some missing syntax since there is only one box for the 12
> arrays?! In fact in my attempts to realize this endeavour i saw this
> ouput too. Unfortunately my lack of R knowledge prevented me getting
> around whatever the problem is. If you have a further suggestion im
all
> ears.
>
> "data", the result of inheriting my teachers bad habits, is now
"dat"...
>
> Further i attach a boxplot (better than histogram right- "RMA vs
Qunt
> norm of MAS5 preproced & summed.jpg") comparing the two methods in
my
> orignal post. When i look at the variation of the RMA processed and
> normed data i question how effectively can i compare these arrays
with
> each. Especially along side the method shown which is MAS5
preprocessed
> then quantile normalized using a colleagues script. These arrays
look
> much more comparable, even ideally so. Why dont i just use this
appraoch
> you may ask? A: im convinced RMA is a superior preprocessing and
> summarizing method...i just need to be able to reconcile the
apparent
> variation in the final output. Or perhaps, better understand it.
>
> Your further thoughts, suggests greatly appreciated,
>
> karl
>
>
>
>
>
> on 9/11/2006 3:42 PM James W. MacDonald said the following:
>
>> k. brand wrote:
>>
>>> Dear All,
>>>
>>> I compared two normalization approaches for an experiment using
>>> twelve affy 430-2.0 chips. (histogram plot comparing bith methods
>>> forwarded on request).
>>>
>>> #1. RMA
>>> library(affy)
>>> data <- ReadAffy()
>>> datarma <- rma(data)
>>> exprs2excel(datarma, file="dataRMA.csv")
>>>
>>> Plotting histograms of the output shows arrays NOT perfectly
aligning
>>> at the means and spreads.
>>>
>>> I used a custom script to effect a quantile normalization on MAS5
>>> preprocessed but unnormalized data-
>>>
>>> #2. Mas5 sans interchip normalization
>>> library(affy)
>>> data <- ReadAffy()
>>> datamas5sannorm <- mas5(data, normalize=FALSE)
>>> exprs2excel(datamas5sannorm, file="datamas5sannorm.csv")
>>> f.qnorm <- function(x,qinit=0.75,perc=100) {...
>>>
>>> The means and spreads of this normalization approach do align
perfectly.
>>>
>>> THUS- summarizing probe intensites before or after normalization
does
>>> appear to make a noticeable difference, as may be expected.
>>>
>>> My questions/requests-
>>>
>>> 1. Help to effect Bolstad normalization of the RMA preprocessed
and
>>> summarized data. Whilst I succeed in generating unnormalized RMA
>>> preprocessed data with-
>>>
>>> library(affy)
>>> data <- ReadAffy()
>>> datarma <- rma(data, normalize=FALSE)
>>
>>
>> Next step would be
>>
>> datarma <- normalize.quantiles(exprs(datarma))
>>
>> also note that 'data' is not a very good variable name, as you are
>> masking an existing function. When creating variable names it is
often
>> enlightening to type the name first at an R prompt to see if you
get
>> any response.
>>
>>
>>>
>>> As a result of my limited R experience, I failed in finding a
method
>>> to effect Bolstad (quantile) normalization on this output.
>>>
>>> 2. Thoughts/comments on the benefits/caveats of normalizing before
or
>>> after summarizing probe intensities.
>>
>>
>> Normalizing after summarization for something like rma() seems
>> questionable to me. Since the expression values are based on
fitting a
>> model to the PM probe values, if you don't normalize first you are
>> ignoring any non-biological variability which may end up biasing
your
>> results. Using median polish for the model fit should help protect
>> against this, but I don't know that I would want to take chances.
>>
>> As an aside, how far off are the histograms? Are you sure that
there
>> is a reasonable difference? Eyeballing a histogram isn't the best
way
>> to determine if the mean and variance are different or not. A quick
>> run through with some data here shows very little differences:
>>
>> > eset <- justRMA(filenames=list.celfiles()[1:10])
>> > apply(exprs(eset),2,summary)
>> Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6
Sample 7
>> Min. 4.085 4.070 4.091 4.051 4.068 4.090
4.087
>> 1st Qu. 5.835 5.859 5.832 5.812 5.842 5.858
5.852
>> Median 7.079 7.069 7.048 7.061 7.070 7.077
7.080
>> Mean 7.225 7.227 7.224 7.227 7.229 7.225
7.232
>> 3rd Qu. 8.352 8.324 8.351 8.363 8.361 8.330
8.347
>> Max. 14.550 14.440 14.420 14.400 14.490 14.430
14.260
>>
>> Best,
>>
>> Jim
>>
>>
>>>
>>> I look forward to any thoughts, advice & suggestions from users.
>>>
>>> thanks in advance,
>>>
>>> Karl
>>>
>>>
>>> ===========================================
>>>
>>> > sessionInfo()
>>> Version 2.3.0 (2006-04-24)
>>> i386-pc-mingw32
>>>
>>> attached base packages:
>>> [1] "tools" "methods" "stats" "graphics" "grDevices"
"utils"
>>> "datasets" "base"
>>>
>>> other attached packages:
>>> affy affyio Biobase
>>> "1.10.0" "1.0.0" "1.10.0"
>>>
>>
>>
>
--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
**********************************************************
Electronic Mail is not secure, may not be read every day, and should
not be used for urgent or sensitive issues.