quantile normalization approach

0

Entering edit mode

Wang, Hui ▴ 170

@wang-hui-219

Last seen 10.3 years ago

Hi List, Quantile-quantile normalization assumes common distribution for data sets to be normalized. I am fine with replicate normalization using this. However, for different experiments, such as data from different tissues, is the assumption still valid? Could anybody point me to some reference that conducts comparison under many different experimental conditions? (for example, under >10 different tissues or cell line experiments). I read all the papers/ documents I can find. But still not convinced we can use that assumption. Thanks Regards -h [[alternate HTML version deleted]]

Normalization Normalization • 1.4k views

ADD COMMENT • link 21.8 years ago Wang, Hui ▴ 170

0

Entering edit mode

Rafael A. Irizarry ★ 2.3k

@rafael-a-irizarry-205

Last seen 10.3 years ago

On Sat, 22 Mar 2003, Wang, Hui wrote: > Hi List, > > > > Quantile-quantile normalization assumes common distribution for data sets to > be normalized. I am fine with replicate normalization using this. However, > for different experiments, such as data from different tissues, is the > assumption still valid? probably not. but when replicate arrays have completely different distributions, in my opinion one is left with with no choice but to make such assumaptions. are you willing to make the assumption they all have the same median? how about the same quartiles? where to draw the line is not easy. > > > > Could anybody point me to some reference that conducts comparison under many > different experimental conditions? (for example, under >10 different tissues > or cell line experiments). I read all the papers/ documents I can find. But > still not convinced we can use that assumption. > > both RMA papers (Biostatistics and NAR) apply the method to the diltion data set that has liver and central nervous system cell lines. > > Thanks > > > > Regards > > -h > > > [[alternate HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 21.8 years ago Rafael A. Irizarry ★ 2.3k

0

Entering edit mode

Wang, Hui ▴ 170

@wang-hui-219

Last seen 10.3 years ago

Dear Rafael, Thanks for your reply. > > Quantile-quantile normalization assumes common distribution for data sets to > > be normalized. I am fine with replicate normalization using this. However, > > for different experiments, such as data from different tissues, is the > > assumption still valid? > > probably not. but when replicate arrays have completely > different distributions, in my opinion one is left with with no choice but > to make such assumaptions. are you willing to make the assumption they all > have the same median? how about the same quartiles? where to draw the line > is not easy. I agree with you. Here is my thought, for replicates (for example, three chips from one sample preparation), It is probably a valid assumption (of course you have to get rid of problematic chip first, for example chip that have scratches,) even one replicate is 2 times brighter than the other. The sample variation has less effect here. However for samples from different tissues, it is hard to believe this is true. It is very possible that the samples belong to the same type of distribution, however with different mean and variance(I look many QQplot from different experiments). Of course genes with obvious expression changes (biological relevant) usually are minority for a huge data set. It is probably still OK to have that assumption. I just try to find out whether there are rigorous comparisons (vs an assumption). > > Could anybody point me to some reference that conducts comparison under many > > different experimental conditions? (for example, under >10 different tissues > > or cell line experiments). I read all the papers/ documents I can find. But > > still not convinced we can use that assumption. > > both RMA papers (Biostatistics and NAR) apply the method to the diltion > data set that has liver and central nervous system cell lines. I read these papers. They are very good papers and well-written. For the dilution data with the same background, it can help to understand replicate normalization and something understanding of sample variation. However to understand issues across different samples (totally different background) two cell lines may not be enough (of course, this two cell lines seems carefully chosen). I am thinking something like known amount of spike- ins before/after sample preparation in many different tissue/cell line background would give a better understanding. It will address variations caused by chips, sample preps as well as different sample background complexity. The last one is probably more biological relevant. Does this make sense at all? What in your opinion is the best normalization method so far? Regards -h [[alternate HTML version deleted]]

ADD COMMENT • link 21.8 years ago Wang, Hui ▴ 170

0

Entering edit mode

Dear Hui, I have few comments too (inserted in your previous posts). On Sat, Mar 22, 2003 at 04:04:21PM -0800, Wang, Hui wrote: > Dear Rafael, > > Thanks for your reply. > > > > Quantile-quantile normalization assumes common distribution for data > sets to > > > be normalized. I am fine with replicate normalization using this. > However, > > > for different experiments, such as data from different tissues, is the > > > assumption still valid? > > > > probably not. but when replicate arrays have completely > > different distributions, in my opinion one is left with with no choice but > > to make such assumaptions. are you willing to make the assumption they all > > have the same median? how about the same quartiles? where to draw the line > > is not easy. > > I agree with you. Here is my thought, for replicates (for example, three > chips from one sample preparation), It is probably a valid assumption (of > course you have to get rid of problematic chip first, for example chip that > have scratches,) even one replicate is 2 times brighter than the other. The > sample variation has less effect here. > > However for samples from different tissues, it is hard to believe this is > true. It is very possible that the samples belong to the same type of > distribution, however with different mean and variance(I look many QQplot > from different experiments). Of course genes with obvious expression > changes (biological relevant) usually are minority for a huge data set. It > is probably still OK to have that assumption. I just try to find out whether > there are rigorous comparisons (vs an assumption). > I am completely on your side about the underlying assumptions for what I would call 'distribution driven transformation methods'. While using that, one clearly assumes that on the biological side of the story only very few genes are differentially expressed across the different experiments. If one has any reason to suspect that it not the case(*), those normalization method are to be used with care. The method 'invariantset' could make you feel more confident for such cases. However, it does not necessarily mean that these normalisation methods are not acceptable for such cases. I did run one of them(**) on data from different tissues, and I had a good surprise when looking at a matrix of scatter plots for the probe level intensities. The difference of tissues could be observed visually. But, naturally a more in-depth study of these normalization methods for these cases would be needed. Doing a spike-in of thousands of genes is obviously not the thing to do, but I remember seeing a draft of paper on a web site that used a very clever idea: using the mRNA from two different tissues, a third condition was created by mixing RNA from the two tissues. The first name on the draft was William J Lemon (whose email cannot be found in my messy ${HOME} at the moment), he may have other suggestions too... (*): like comparing cells from different tissues as you mentioned, or may be studies of dividing/resting cells, or di-auxic shift, or reaction to heat shock, or healthy/infected cells... (**): can't remember which one it was now.. quantiles, qspline, else ? Hopin' it helps, Laurent > > > > Could anybody point me to some reference that conducts comparison under > many > > > different experimental conditions? (for example, under >10 different > tissues > > > or cell line experiments). I read all the papers/ documents I can find. > But > > > still not convinced we can use that assumption. > > > > both RMA papers (Biostatistics and NAR) apply the method to the diltion > > data set that has liver and central nervous system cell lines. > > I read these papers. They are very good papers and well-written. For the > dilution data with the same background, it can help to understand replicate > normalization and something understanding of sample variation. However to > understand issues across different samples (totally different background) > two cell lines may not be enough (of course, this two cell lines seems > carefully chosen). I am thinking something like known amount of spike-ins > before/after sample preparation in many different tissue/cell line > background would give a better understanding. It will address variations > caused by chips, sample preps as well as different sample background > complexity. The last one is probably more biological relevant. > > Does this make sense at all? > > What in your opinion is the best normalization method so far? > > Regards > > -h > > > [[alternate HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- -------------------------------------------------------------- currently at the National Yang-Ming University in Taipei, Taiwan -------------------------------------------------------------- Laurent Gautier CBS, Building 208, DTU PhD. Student DK-2800 Lyngby,Denmark tel: +45 45 25 24 89 http://www.cbs.dtu.dk/laurent

ADD REPLY • link 21.8 years ago Laurent Gautier ★ 2.3k

Login before adding your answer.