Hello bioconductors,
I fit a linear model to my data with 3 coefficients. I used loess
normalization on genepix data with no background correction. With my
data set, loess normalization resulted in slight reductions in p
values
(relative to "median" normalization for example) and reordering of the
lists of DE genes for all three coefficients, which I took to be a
good
sign. I also have a series of replicate spots, and running
duplicateCorrelation and including the consensus correlation (~0.55)
term in my linear fit further improved the p values and resulted in
some
changes in the lists of DE genes. All of this suggests to me that
loess
and duplicate correlation served to reduce the estimate of variance in
gene expression and weed out artifacts.
However because I'm a little wary of normalisation, I took my raw data
set, non-normalized and non-background corrected and ran
duplicateCorrelation on it. For un-normalized data the consensus
correlation is ~0.73, quite a bit higher than for the loess-normalized
data. After running the same lmFit model with this data set I once
again
obtained different lists of DE genes, with many of the strongest
conclusions carrying over, giving me confidence that I applied the
correct methods and function calls. My question is, should I be
suspicious of the normalized data set? Am I at significant risk of
generating large numbers of artifactual DE genes?
-Dennis
>Date: Wed, 30 Mar 2005 10:55:50 -0800
>From: Dennis Hazelett <hazelett@uoneuro.uoregon.edu>
>Subject: [BioC] loess and duplicate correlation
>To: bioconductor@stat.math.ethz.ch
>
>Hello bioconductors,
>I fit a linear model to my data with 3 coefficients. I used loess
>normalization on genepix data with no background correction. With my
>data set, loess normalization resulted in slight reductions in p
values
>(relative to "median" normalization for example) and reordering of
the
>lists of DE genes for all three coefficients, which I took to be a
good
>sign. I also have a series of replicate spots, and running
>duplicateCorrelation and including the consensus correlation (~0.55)
>term in my linear fit further improved the p values and resulted in
some
>changes in the lists of DE genes. All of this suggests to me that
loess
>and duplicate correlation served to reduce the estimate of variance
in
>gene expression and weed out artifacts.
Actually the two processes have different purposes. Loess
normalization
reduces the residual variability. Duplicate correlation does not do
this,
rather it assesses the residual variability more realistically --
p-values
may go up or down as a consequence.
>However because I'm a little wary of normalisation,
Given the enormous weight of evidence showing that microarray data
needs to
be normalised, I'm wary of unnormalized data.
> I took my raw data
>set, non-normalized and non-background corrected and ran
>duplicateCorrelation on it. For un-normalized data the consensus
>correlation is ~0.73, quite a bit higher than for the loess-
normalized
>data.
Effective normalisation improves the consistency of results between
arrays,
and hence the duplicate correlation, which measures the similarity
between
arrays to that between arrays, will tend to decrease. This is to be
expected.
> After running the same lmFit model with this data set I once again
>obtained different lists of DE genes, with many of the strongest
>conclusions carrying over, giving me confidence that I applied the
>correct methods and function calls. My question is, should I be
>suspicious of the normalized data set? Am I at significant risk of
>generating large numbers of artifactual DE genes?
>-Dennis
You haven't stated any reason for suspicion -- you seem to have had
only
good experience -- so it is hard to know what further to say.
Gordon
Hi Gordon,
Thanks, that clears up the issue for me. I had a hunch this might be
the
case but I wanted to hear it from someone who understands the
statistics
better than I. The reason I say I'm "wary of normalization" isn't that
I
dispute the evidence that it removes unwanted variation inherent in
the
technology. It is mainly because I'm wary of my ability to apply it
correctly. ;-)
-d
Gordon Smyth wrote:
>
>> Date: Wed, 30 Mar 2005 10:55:50 -0800
>> From: Dennis Hazelett <hazelett@uoneuro.uoregon.edu>
>> Subject: [BioC] loess and duplicate correlation
>> To: bioconductor@stat.math.ethz.ch
>>
>> Hello bioconductors,
>> I fit a linear model to my data with 3 coefficients. I used loess
>> normalization on genepix data with no background correction. With
my
>> data set, loess normalization resulted in slight reductions in p
values
>> (relative to "median" normalization for example) and reordering of
the
>> lists of DE genes for all three coefficients, which I took to be a
good
>> sign. I also have a series of replicate spots, and running
>> duplicateCorrelation and including the consensus correlation
(~0.55)
>> term in my linear fit further improved the p values and resulted in
some
>> changes in the lists of DE genes. All of this suggests to me that
loess
>> and duplicate correlation served to reduce the estimate of variance
in
>> gene expression and weed out artifacts.
>
>
> Actually the two processes have different purposes. Loess
> normalization reduces the residual variability. Duplicate
correlation
> does not do this, rather it assesses the residual variability more
> realistically -- p-values may go up or down as a consequence.
>
>> However because I'm a little wary of normalisation,
>
>
> Given the enormous weight of evidence showing that microarray data
> needs to be normalised, I'm wary of unnormalized data.
>
>> I took my raw data
>> set, non-normalized and non-background corrected and ran
>> duplicateCorrelation on it. For un-normalized data the consensus
>> correlation is ~0.73, quite a bit higher than for the loess-
normalized
>> data.
>
>
> Effective normalisation improves the consistency of results between
> arrays, and hence the duplicate correlation, which measures the
> similarity between arrays to that between arrays, will tend to
> decrease. This is to be expected.
>
>> After running the same lmFit model with this data set I once again
>> obtained different lists of DE genes, with many of the strongest
>> conclusions carrying over, giving me confidence that I applied the
>> correct methods and function calls. My question is, should I be
>> suspicious of the normalized data set? Am I at significant risk of
>> generating large numbers of artifactual DE genes?
>> -Dennis
>
>
> You haven't stated any reason for suspicion -- you seem to have had
> only good experience -- so it is hard to know what further to say.
>
> Gordon