RE: quantile normalization
3
0
Entering edit mode
Mark Reimers ▴ 70
@mark-reimers-658
Last seen 7.1 years ago
How different are your 'biologically different' samples? In our experience quantile-normalizing across very different samples makes a noticeable difference in relative expression within fairly similar samples. Our main data is cancer cell lines from 9 different tissues, and we find considerable differences (ie 2% of genes greater than factor of 2 different), comparing normalization within tissue-of-origin to normalization across all samples. My opinion now is that we should normalize within tissue-of-origin, and then standardize raw data across tissues by scaling to constant median. However I find that RMA (1.3) gives different numbers when I separate out the normalize( cel.data ) process from estimation ( rma( normed.data , normalize=F)), compared with rma( cel.data). Has anyone else observed this? Regards Mark Message: 1 Date: Sun, 29 Aug 2004 18:00:30 -0400 From: "H. Han" <hihan@brown.edu> Subject: [BioC] quantile normalization To: <bioconductor@stat.math.ethz.ch> Message-ID: <00bb01c48e13$99faa680$48c49480@micron10> Content-Type: text/plain Hi: Does anyone has input on compatibility of "replicate-only" vs. "all- sample" quantile normalizations? I'd assume that "true significant" genes would be picked up by either "replicate-only" or "all-sample" method, though the latter is surely more conservative (by forcing the same distribution across all samples, replicates or not). My analysis though seem to select two distinct lists of genes by two methods. e.g. If I pick top few hundreds from both lists, there'd be little overlap. Would it because my initial pool of genes are large (10,000 or so), or inherently these two methods are two assumptions, and not to be compared? thanks in advance, Hillary [[alternative HTML version deleted]]
0
Entering edit mode
H. Han ▴ 30
@h-han-895
Last seen 7.1 years ago
Hi, Mark and Jim: thanks for answering my post. In my case, the samples are not too different - they are from different dosage levels of the same treatment. i'm expecting some differences between two normalization methods. Though the actual differences are somewhat larger than expected (dozen vs a few thousand significant genes ). One reason of course, as Jim pointed out, "replicate only" method would pick out more false sig genes. i now think that a particular reason that makes our difference large is number of replicates. i have 4 replicates each condition. when i am doing "replicate only" normalization, there is a big chance that all four have similar quantile ranks, and hence the same expression values. this reduces st. devation, and increases t values in general. i assume when # of replicates become larger, the outcomes of both methods would be closer. regards, hillary ----- Original Message ----- From: "Reimers, Mark (NIH/NCI)" <reimersm@mail.nih.gov> To: <bioconductor@stat.math.ethz.ch> Sent: Monday, August 30, 2004 12:01 PM Subject: [BioC] RE: quantile normalization > How different are your 'biologically different' samples? In our experience > quantile-normalizing across very different samples makes a noticeable > difference in relative expression within fairly similar samples. > > Our main data is cancer cell lines from 9 different tissues, and we find > considerable differences (ie 2% of genes greater than factor of 2 > different), comparing normalization within tissue-of-origin to normalization > across all samples. My opinion now is that we should normalize within > tissue-of-origin, and then standardize raw data across tissues by scaling to > constant median. However I find that RMA (1.3) gives different numbers when > I separate out the normalize( cel.data ) process from estimation ( rma( > normed.data , normalize=F)), compared with rma( cel.data). Has anyone else > observed this? > > Regards > > Mark > > Message: 1 > Date: Sun, 29 Aug 2004 18:00:30 -0400 > From: "H. Han" <hihan@brown.edu> > Subject: [BioC] quantile normalization > To: <bioconductor@stat.math.ethz.ch> > Message-ID: <00bb01c48e13$99faa680$48c49480@micron10> > Content-Type: text/plain > > Hi: > > Does anyone has input on compatibility of "replicate-only" vs. "all-sample" > quantile normalizations? I'd assume that "true significant" genes would be > picked up by either "replicate-only" or "all-sample" method, though the > latter is surely more conservative (by forcing the same distribution across > all samples, replicates or not). My analysis though seem to select two > distinct lists of genes by two methods. e.g. If I pick top few hundreds from > both lists, there'd be little overlap. Would it because my initial pool of > genes are large (10,000 or so), or inherently these two methods are two > assumptions, and not to be compared? > > thanks in advance, > > Hillary > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >
0
Entering edit mode
@sangesbiogemit-668
Last seen 7.1 years ago
On Aug 30, 2004, at 6:01 PM, Reimers, Mark (NIH/NCI) wrote: > My opinion now is that we should normalize within > tissue-of-origin, and then standardize raw data across tissues by > scaling to > constant median. However I find that RMA (1.3) gives different numbers > when > I separate out the normalize( cel.data ) process from estimation ( rma( > normed.data , normalize=F)), compared with rma( cel.data). Has anyone > else > observed this? Yes I observed the same think. There are more changed genes when you normalize within tissue-of- origin instead of rma(cel.data). Now the problem is to understand if this variability is due to numerical of biological issue. By the way if I well remember the first vignette of RMA suggested to normalize data within sample-of-origin while the next suggested to do simply rma(cel.data). Any comment? Thanks Remo
0
Entering edit mode
H. Han ▴ 30
@h-han-895
Last seen 7.1 years ago
----- Original Message ----- From: "Remo Sanges" <sanges@biogem.it> > There are more changed genes when you normalize within tissue-of- origin > instead of rma(cel.data). Now the problem is to understand if this > variability > is due to numerical of biological issue. I believe we can explain more changed genes by assumption changes. In all-sample normalization, we assume on average gene does not change (average diff = 0) whereas in the within normalization, we relaxed the assumption (average <>0). in my case, e.g, after within group norm, i found the second group in general expresses lower than the first. Both the number of sig genes and direction of change (mostly repressed) are dictated by this "average diff". cheers, Hillary