effect of normalization on analysis of differential knockdown

0

Entering edit mode

Rajarshi Guha ▴ 120

@rajarshi-guha-3531

Last seen 8.0 years ago

Hi, I am analysing the results from a drug sensitization siRNA screen and am trying to determine which genes are being differentially knocked down (between a vehicle only run and a dosed run). Each gene is targeted by 4 siRNA's and my initial strategy has been to consider the signals from the 4 siRNA's to be individual samples for that gene. Then I perform a paired t-test on the 4 signals for a given gene across the two conditions. I then calculate Storey's q-values based on the resultant p-values. The question: does/should the normalization of the plates have an effect on the results of the above analysis? For example, I considered two normalization schemes - 1) normalizing each plate to the median of a separate negative control plate and 2) B-score normalization. If I rank the genes based on their q-values I get 2 very different rankings for the two normalization schemes. Furthermore, the q- & p- values differ greatly. In the case of median normalization I get a number of q-values < 0.05 but when using B-score I get a single gene with a q-value < 0.05 (and the next closest value is 0.58). Thinking that this study is analogous to differential expression studies in microarrays, I tried running my dataset through the SAM method (via siggenes). Using this method, the B-score normalized data leads to no hits (and a pi0 = 1) whereas the median normalization method leads to lots of hits. I can see that B-score normalized data would differ in character from median normalized data (seeing that the actual signals are replaced with scaled residuals) - but is it to be expected that normalization schemes would lead to such different results in this type of analysis? Any pointers would be appreciated. Thanks, ------------------------------------------------------------------- Rajarshi Guha <rajarshi.guha at="" gmail.com=""> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 ------------------------------------------------------------------- Q: What's polite and works for the phone company? A: A deferential operator.

Normalization Normalization • 1.2k views

ADD COMMENT • link updated 14.8 years ago by Wolfgang Huber ★ 13k • written 14.8 years ago by Rajarshi Guha ▴ 120

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 11 days ago

EMBL European Molecular Biology Laborat…

Hi Rajarshi your t, p, q value computation seems reasonable to me. You may want to choose a regularised version of the t-test (like in limma's eBayes) since with only 4 samples, you may otherwise get an unnecessarily large fraction of false discoveries due to the sample variance being small (and t large) by chance. As for your question about the choice of normalisation method one (perhaps not too constructive, but not ignorable) possible answer is that the technical or biological variability ("noise") in your data is stronger than the biological signal. Best wishes Wolfgang Rajarshi Guha wrote: > Hi, I am analysing the results from a drug sensitization siRNA screen > and am trying to determine which genes are being differentially knocked > down (between a vehicle only run and a dosed run). > > Each gene is targeted by 4 siRNA's and my initial strategy has been to > consider the signals from the 4 siRNA's to be individual samples for > that gene. Then I perform a paired t-test on the 4 signals for a given > gene across the two conditions. I then calculate Storey's q-values based > on the resultant p-values. > > The question: does/should the normalization of the plates have an effect > on the results of the above analysis? For example, I considered two > normalization schemes - 1) normalizing each plate to the median of a > separate negative control plate and 2) B-score normalization. > > If I rank the genes based on their q-values I get 2 very different > rankings for the two normalization schemes. Furthermore, the q- & > p-values differ greatly. In the case of median normalization I get a > number of q-values < 0.05 but when using B-score I get a single gene > with a q-value < 0.05 (and the next closest value is 0.58). > > Thinking that this study is analogous to differential expression studies > in microarrays, I tried running my dataset through the SAM method (via > siggenes). Using this method, the B-score normalized data leads to no > hits (and a pi0 = 1) whereas the median normalization method leads to > lots of hits. > > I can see that B-score normalized data would differ in character from > median normalized data (seeing that the actual signals are replaced with > scaled residuals) - but is it to be expected that normalization schemes > would lead to such different results in this type of analysis? > > Any pointers would be appreciated. > > Thanks, > > ------------------------------------------------------------------- > Rajarshi Guha <rajarshi.guha at="" gmail.com=""> > GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 > ------------------------------------------------------------------- > Q: What's polite and works for the phone company? > A: A deferential operator. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang ------------------------------------------------------- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 14.8 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Why would you bother to normalize if it did not affect the results of the analysis? The purpose of normalization is to dampen some of the noise so that the signal (i.e. differential expression) is clearer. The normalization method can have a huge effect, depending on how much noise there was in the experiment, and whether the assumptions underlying the normalization are met. I am not familiar with B-score normalization. Normalization to the median of a particular treatment or control makes sense if you expect the median of all the samples to be the same except for noise. If not, e.g. if there is down-regulation but no up-regulation, then you are inducing signal by normalizing. --Naomi At 05:14 PM 7/18/2009, Wolfgang Huber wrote: >Hi Rajarshi > >your t, p, q value computation seems reasonable to me. You may want >to choose a regularised version of the t-test (like in limma's >eBayes) since with only 4 samples, you may otherwise get an >unnecessarily large fraction of false discoveries due to the sample >variance being small (and t large) by chance. > >As for your question about the choice of normalisation method one >(perhaps not too constructive, but not ignorable) possible answer is >that the technical or biological variability ("noise") in your data >is stronger than the biological signal. > > Best wishes > Wolfgang > > >Rajarshi Guha wrote: >>Hi, I am analysing the results from a drug sensitization siRNA >>screen and am trying to determine which genes are being >>differentially knocked down (between a vehicle only run and a dosed run). >>Each gene is targeted by 4 siRNA's and my initial strategy has been >>to consider the signals from the 4 siRNA's to be individual samples >>for that gene. Then I perform a paired t-test on the 4 signals for >>a given gene across the two conditions. I then calculate Storey's >>q-values based on the resultant p-values. >>The question: does/should the normalization of the plates have an >>effect on the results of the above analysis? For example, I >>considered two normalization schemes - 1) normalizing each plate to >>the median of a separate negative control plate and 2) B-score normalization. >>If I rank the genes based on their q-values I get 2 very different >>rankings for the two normalization schemes. Furthermore, the q- & >>p-values differ greatly. In the case of median normalization I get >>a number of q-values < 0.05 but when using B-score I get a single >>gene with a q-value < 0.05 (and the next closest value is 0.58). >>Thinking that this study is analogous to differential expression >>studies in microarrays, I tried running my dataset through the SAM >>method (via siggenes). Using this method, the B-score normalized >>data leads to no hits (and a pi0 = 1) whereas the median >>normalization method leads to lots of hits. >>I can see that B-score normalized data would differ in character >>from median normalized data (seeing that the actual signals are >>replaced with scaled residuals) - but is it to be expected that >>normalization schemes would lead to such different results in this >>type of analysis? >>Any pointers would be appreciated. >>Thanks, >>------------------------------------------------------------------- >>Rajarshi Guha <rajarshi.guha at="" gmail.com=""> >>GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 >>------------------------------------------------------------------- >>Q: What's polite and works for the phone company? >>A: A deferential operator. >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor > > >-- > >Best wishes > Wolfgang > >------------------------------------------------------- >Wolfgang Huber >EMBL >http://www.embl.de/research/units/genome_biology/huber > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 14.8 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Hi Naomi, of course normalisation is useful. I want to point out the importance of complementing it by quality assessment & control. Just comparing different normalisation 'black boxes' on the basis of resulting hit lists (of which there seemed a hint in the original post, and which has all too often been done with microarray data in this community) is less advisable. Best wishes Wolfgang ------------------------------------------------------- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber ------------------------------------------------------- Naomi Altman ha scritto: > Why would you bother to normalize if it did not affect the results of > the analysis? The purpose of normalization is to dampen some of the > noise so that the > signal (i.e. differential expression) is clearer. The normalization > method can have a huge effect, depending on how much noise there was in > the experiment, and > whether the assumptions underlying the normalization are met. > > I am not familiar with B-score normalization. Normalization to the > median of a particular treatment or control makes sense if you expect > the median of all the samples to be the same except for noise. If not, > e.g. if there is down-regulation but no up-regulation, then you are > inducing signal by normalizing. > > --Naomi > > At 05:14 PM 7/18/2009, Wolfgang Huber wrote: > >> Hi Rajarshi >> >> your t, p, q value computation seems reasonable to me. You may want to >> choose a regularised version of the t-test (like in limma's eBayes) >> since with only 4 samples, you may otherwise get an unnecessarily >> large fraction of false discoveries due to the sample variance being >> small (and t large) by chance. >> >> As for your question about the choice of normalisation method one >> (perhaps not too constructive, but not ignorable) possible answer is >> that the technical or biological variability ("noise") in your data is >> stronger than the biological signal. >> >> Best wishes >> Wolfgang >> >> >> Rajarshi Guha wrote: >>> Hi, I am analysing the results from a drug sensitization siRNA screen >>> and am trying to determine which genes are being differentially >>> knocked down (between a vehicle only run and a dosed run). >>> Each gene is targeted by 4 siRNA's and my initial strategy has been >>> to consider the signals from the 4 siRNA's to be individual samples >>> for that gene. Then I perform a paired t-test on the 4 signals for a >>> given gene across the two conditions. I then calculate Storey's >>> q-values based on the resultant p-values. >>> The question: does/should the normalization of the plates have an >>> effect on the results of the above analysis? For example, I >>> considered two normalization schemes - 1) normalizing each plate to >>> the median of a separate negative control plate and 2) B-score >>> normalization. >>> If I rank the genes based on their q-values I get 2 very different >>> rankings for the two normalization schemes. Furthermore, the q- & >>> p-values differ greatly. In the case of median normalization I get a >>> number of q-values < 0.05 but when using B-score I get a single gene >>> with a q-value < 0.05 (and the next closest value is 0.58). >>> Thinking that this study is analogous to differential expression >>> studies in microarrays, I tried running my dataset through the SAM >>> method (via siggenes). Using this method, the B-score normalized data >>> leads to no hits (and a pi0 = 1) whereas the median normalization >>> method leads to lots of hits. >>> I can see that B-score normalized data would differ in character from >>> median normalized data (seeing that the actual signals are replaced >>> with scaled residuals) - but is it to be expected that normalization >>> schemes would lead to such different results in this type of analysis? >>> Any pointers would be appreciated. >>> Thanks, >>> ------------------------------------------------------------------- >>> Rajarshi Guha <rajarshi.guha at="" gmail.com=""> >>> GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 >>> ------------------------------------------------------------------- >>> Q: What's polite and works for the phone company? >>> A: A deferential operator. >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> >> Best wishes >> Wolfgang >> >> ------------------------------------------------------- >> Wolfgang Huber >> EMBL >> http://www.embl.de/research/units/genome_biology/huber >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor --

ADD REPLY • link 14.8 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dear Wolfgang, Sorry for any misunderstanding. I was responding to the original post - not to your useful comments. --Naomi At 06:12 AM 7/20/2009, Wolfgang Huber wrote: >Hi Naomi, > >of course normalisation is useful. I want to point out the >importance of complementing it by quality assessment & control. > >Just comparing different normalisation 'black boxes' on the basis of >resulting hit lists (of which there seemed a hint in the original >post, and which has all too often been done with microarray data in >this community) is less advisable. > >Best wishes > Wolfgang > >------------------------------------------------------- >Wolfgang Huber >EMBL >http://www.embl.de/research/units/genome_biology/huber >------------------------------------------------------- > > > > >Naomi Altman ha scritto: >>Why would you bother to normalize if it did not affect the results >>of the analysis? The purpose of normalization is to dampen some of >>the noise so that the >>signal (i.e. differential expression) is clearer. The >>normalization method can have a huge effect, depending on how much >>noise there was in the experiment, and >>whether the assumptions underlying the normalization are met. >>I am not familiar with B-score normalization. Normalization to the >>median of a particular treatment or control makes sense if you >>expect the median of all the samples to be the same except for >>noise. If not, e.g. if there is down-regulation but no >>up-regulation, then you are inducing signal by normalizing. >>--Naomi >>At 05:14 PM 7/18/2009, Wolfgang Huber wrote: >> >>>Hi Rajarshi >>> >>>your t, p, q value computation seems reasonable to me. You may >>>want to choose a regularised version of the t-test (like in >>>limma's eBayes) since with only 4 samples, you may otherwise get >>>an unnecessarily large fraction of false discoveries due to the >>>sample variance being small (and t large) by chance. >>> >>>As for your question about the choice of normalisation method one >>>(perhaps not too constructive, but not ignorable) possible answer >>>is that the technical or biological variability ("noise") in your >>>data is stronger than the biological signal. >>> >>> Best wishes >>> Wolfgang >>> >>> >>>Rajarshi Guha wrote: >>>>Hi, I am analysing the results from a drug sensitization siRNA >>>>screen and am trying to determine which genes are being >>>>differentially knocked down (between a vehicle only run and a dosed run). >>>>Each gene is targeted by 4 siRNA's and my initial strategy has >>>>been to consider the signals from the 4 siRNA's to be individual >>>>samples for that gene. Then I perform a paired t-test on the 4 >>>>signals for a given gene across the two conditions. I then >>>>calculate Storey's q-values based on the resultant p-values. >>>>The question: does/should the normalization of the plates have an >>>>effect on the results of the above analysis? For example, I >>>>considered two normalization schemes - 1) normalizing each plate >>>>to the median of a separate negative control plate and 2) B-score >>>>normalization. >>>>If I rank the genes based on their q-values I get 2 very >>>>different rankings for the two normalization schemes. >>>>Furthermore, the q- & p-values differ greatly. In the case of >>>>median normalization I get a number of q-values < 0.05 but when >>>>using B-score I get a single gene with a q-value < 0.05 (and the >>>>next closest value is 0.58). >>>>Thinking that this study is analogous to differential expression >>>>studies in microarrays, I tried running my dataset through the >>>>SAM method (via siggenes). Using this method, the B-score >>>>normalized data leads to no hits (and a pi0 = 1) whereas the >>>>median normalization method leads to lots of hits. >>>>I can see that B-score normalized data would differ in character >>>>from median normalized data (seeing that the actual signals are >>>>replaced with scaled residuals) - but is it to be expected that >>>>normalization schemes would lead to such different results in >>>>this type of analysis? >>>>Any pointers would be appreciated. >>>>Thanks, >>>>------------------------------------------------------------------ - >>>>Rajarshi Guha <rajarshi.guha at="" gmail.com=""> >>>>GPG Fingerprint: D070 5427 CC5B 7938 929C DD13 66A1 922C 51E7 9E84 >>>>------------------------------------------------------------------ - >>>>Q: What's polite and works for the phone company? >>>>A: A deferential operator. >>>>_______________________________________________ >>>>Bioconductor mailing list >>>>Bioconductor at stat.math.ethz.ch >>>>https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>Search the archives: >>>>http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>>-- >>> >>>Best wishes >>> Wolfgang >>> >>>------------------------------------------------------- >>>Wolfgang Huber >>>EMBL >>>http://www.embl.de/research/units/genome_biology/huber >>> >>>_______________________________________________ >>>Bioconductor mailing list >>>Bioconductor at stat.math.ethz.ch >>>https://stat.ethz.ch/mailman/listinfo/bioconductor >>>Search the archives: >>>http://news.gmane.org/gmane.science.biology.informatics.conductor >>Naomi S. Altman 814-865-3791 (voice) >>Associate Professor >>Dept. of Statistics 814-863-7114 (fax) >>Penn State University 814-865-1348 (Statistics) >>University Park, PA 16802-2111 >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor at stat.math.ethz.ch >>https://stat.ethz.ch/mailman/listinfo/bioconductor >>Search the archives: >>http://news.gmane.org/gmane.science.biology.informatics.conductor > > >-- > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 14.8 years ago Naomi Altman ★ 6.0k

Login before adding your answer.