loged data or not loged previous to use normalize.quantile

0

Entering edit mode

Marcelo Luiz de Laia ▴ 770

@marcelo-luiz-de-laia-377

Last seen 11.4 years ago

Dear Bioconductors Friends, I have a question that I dont found answer for it. Please, if you have a paper/article that explain it, please, tell me. I normalize our data using normalize.quantile function. If I previous transform our intensities (single channel) in log2, I dont get differentially genes in limma. But, if I dont transform our data, I get some genes with p.value around 0.0001, thats is great! Of course, when I transform the intensities data to log2, I get some NA. Why are there this difference? Am I wrong in does an analysis with not loged data? Thanks a lot Marcelo

limma limma • 2.5k views

ADD COMMENT • link updated 20.9 years ago by Rhonda DeCook ▴ 90 • written 20.9 years ago by Marcelo Luiz de Laia ▴ 770

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 4 months ago

EMBL European Molecular Biology Laborat…

Hi Marcelo, the difference is that the power of the test you are doing can be different when you consider the data on the "raw" or on the log-transformed scale. Also, the p-value calculated by limma is based on the assumption that the null-distribution of the test statistic is given by a t-distribution; this assumption might be more or less true in both cases. You are really doing two different tests: test A, say, consists of applying the t-statistic to the untransformed intensities, test B, say, applying the t-statistic to the transformed intensities. Then, if you want to use the t-distribution for getting p-values, you need to make sure that the null distribution of your test statistic is indeed (to good enough approximation) t-distributed. You can do this e.g. by permutations. For that you need either a large number of replicates, or to pool variance estimators across genes. If you don't want to make a parametric assumption for getting p-values, you need a larger number of replicates; if you have these, you can for example calculate a permutation p-value. So, there is really no "right" or "wrong" about transforming, or which transformation -- as long as you don't violate the assumptions of the subsequent tests. If the assumptions are met, then the procedure with the highest power is preferable. And that depends very much on your data (about which you have not told us much.) Hope that helps. And here is another shameless plug: have a look at this paper: Differential Expression with the Bioconductor Project http://www.bepress.com/bioconductor/paper7 Best wishes Wolfgang Marcelo Luiz de Laia wrote: > Dear Bioconductors Friends, > > I have a question that I dont found answer for it. Please, if you have a > paper/article that explain it, please, tell me. > > I normalize our data using normalize.quantile function. > > If I previous transform our intensities (single channel) in log2, I dont > get differentially genes in limma. > > But, if I dont transform our data, I get some genes with p.value around > 0.0001, thats is great! > > Of course, when I transform the intensities data to log2, I get some NA. > > Why are there this difference? Am I wrong in does an analysis with not > loged data? > > Thanks a lot > > Marcelo > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- Best regards Wolfgang ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber

ADD COMMENT • link 20.9 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Hi Marcelo; As what Wolfgang mentioned, non-parametric permutation test is an option when t-distribution assumption is not valid. But if you have few replications (2-3), most permutation tests don't have power either. I would suggest you try RankProd package, which would be powerful enough to detect differentially expressed genes with 2 replications. Bests; Fangxin > Hi Marcelo, > > the difference is that the power of the test you are doing can be > different when you consider the data on the "raw" or on the > log-transformed scale. > > Also, the p-value calculated by limma is based on the assumption that > the null-distribution of the test statistic is given by a > t-distribution; this assumption might be more or less true in both cases. > > You are really doing two different tests: test A, say, consists of > applying the t-statistic to the untransformed intensities, test B, say, > applying the t-statistic to the transformed intensities. > > Then, if you want to use the t-distribution for getting p-values, you > need to make sure that the null distribution of your test statistic > is indeed (to good enough approximation) t-distributed. You can do this > e.g. by permutations. For that you need either a large number of > replicates, or to pool variance estimators across genes. > > If you don't want to make a parametric assumption for getting p-values, > you need a larger number of replicates; if you have these, you can for > example calculate a permutation p-value. > > So, there is really no "right" or "wrong" about transforming, or which > transformation -- as long as you don't violate the assumptions of the > subsequent tests. If the assumptions are met, then the procedure with > the highest power is preferable. And that depends very much on your data > (about which you have not told us much.) > > Hope that helps. > > And here is another shameless plug: have a look at this paper: > Differential Expression with the Bioconductor Project > http://www.bepress.com/bioconductor/paper7 > > Best wishes > Wolfgang > > Marcelo Luiz de Laia wrote: >> Dear Bioconductors Friends, >> >> I have a question that I dont found answer for it. Please, if you have a >> paper/article that explain it, please, tell me. >> >> I normalize our data using normalize.quantile function. >> >> If I previous transform our intensities (single channel) in log2, I dont >> get differentially genes in limma. >> >> But, if I dont transform our data, I get some genes with p.value around >> 0.0001, thats is great! >> >> Of course, when I transform the intensities data to log2, I get some NA. >> >> Why are there this difference? Am I wrong in does an analysis with not >> loged data? >> >> Thanks a lot >> >> Marcelo >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > > -- > Best regards > Wolfgang > > ------------------------------------- > Wolfgang Huber > European Bioinformatics Institute > European Molecular Biology Laboratory > Cambridge CB10 1SD > England > Phone: +44 1223 494642 > Fax: +44 1223 494486 > Http: www.ebi.ac.uk/huber > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > > -- Fangxin Hong, Ph.D. Plant Biology Laboratory The Salk Institute 10010 N. Torrey Pines Rd. La Jolla, CA 92037 E-mail: fhong@salk.edu

ADD REPLY • link 20.9 years ago Fangxin Hong ▴ 810

0

Entering edit mode

I just want to remind people that permutation tests, rank tests, etc still require i.i.d. errors. So the variance needs to be stabilized even for nonparametric tests. --Naomi At 01:32 PM 4/4/2005, Fangxin Hong wrote: >Hi Marcelo; >As what Wolfgang mentioned, non-parametric permutation test is an option >when t-distribution assumption is not valid. But if you have few >replications (2-3), most permutation tests don't have power either. I >would suggest you try RankProd package, which would be powerful enough to >detect differentially expressed genes with 2 replications. > >Bests; >Fangxin > > > > > Hi Marcelo, > > > > the difference is that the power of the test you are doing can be > > different when you consider the data on the "raw" or on the > > log-transformed scale. > > > > Also, the p-value calculated by limma is based on the assumption that > > the null-distribution of the test statistic is given by a > > t-distribution; this assumption might be more or less true in both cases. > > > > You are really doing two different tests: test A, say, consists of > > applying the t-statistic to the untransformed intensities, test B, say, > > applying the t-statistic to the transformed intensities. > > > > Then, if you want to use the t-distribution for getting p-values, you > > need to make sure that the null distribution of your test statistic > > is indeed (to good enough approximation) t-distributed. You can do this > > e.g. by permutations. For that you need either a large number of > > replicates, or to pool variance estimators across genes. > > > > If you don't want to make a parametric assumption for getting p-values, > > you need a larger number of replicates; if you have these, you can for > > example calculate a permutation p-value. > > > > So, there is really no "right" or "wrong" about transforming, or which > > transformation -- as long as you don't violate the assumptions of the > > subsequent tests. If the assumptions are met, then the procedure with > > the highest power is preferable. And that depends very much on your data > > (about which you have not told us much.) > > > > Hope that helps. > > > > And here is another shameless plug: have a look at this paper: > > Differential Expression with the Bioconductor Project > > http://www.bepress.com/bioconductor/paper7 > > > > Best wishes > > Wolfgang > > > > Marcelo Luiz de Laia wrote: > >> Dear Bioconductors Friends, > >> > >> I have a question that I dont found answer for it. Please, if you have a > >> paper/article that explain it, please, tell me. > >> > >> I normalize our data using normalize.quantile function. > >> > >> If I previous transform our intensities (single channel) in log2, I dont > >> get differentially genes in limma. > >> > >> But, if I dont transform our data, I get some genes with p.value around > >> 0.0001, thats is great! > >> > >> Of course, when I transform the intensities data to log2, I get some NA. > >> > >> Why are there this difference? Am I wrong in does an analysis with not > >> loged data? > >> > >> Thanks a lot > >> > >> Marcelo > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > -- > > Best regards > > Wolfgang > > > > ------------------------------------- > > Wolfgang Huber > > European Bioinformatics Institute > > European Molecular Biology Laboratory > > Cambridge CB10 1SD > > England > > Phone: +44 1223 494642 > > Fax: +44 1223 494486 > > Http: www.ebi.ac.uk/huber > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > >-- >Fangxin Hong, Ph.D. >Plant Biology Laboratory >The Salk Institute >10010 N. Torrey Pines Rd. >La Jolla, CA 92037 >E-mail: fhong@salk.edu > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 20.9 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Kasper Daniel Hansen ▴ 630

@kasper-daniel-hansen-459

Last seen 11.4 years ago

Almost any statistical analysis you do will be impacted by transforming the data. This is to be expected, so this is something you generally want to consider before (and during) your analysis. In the microarray litterature there are zillions of examples showing that it is (usually) preferable to do your analysis on the log2 scale. One reason is that you are generally looking for relative changes instead of absolute changes, but there are more. Kasper On Fri, Apr 01, 2005 at 03:20:17PM -0300, Marcelo Luiz de Laia wrote: > Dear Bioconductors Friends, > > I have a question that I dont found answer for it. Please, if you have a > paper/article that explain it, please, tell me. > > I normalize our data using normalize.quantile function. > > If I previous transform our intensities (single channel) in log2, I dont > get differentially genes in limma. > > But, if I dont transform our data, I get some genes with p.value around > 0.0001, thats is great! > > Of course, when I transform the intensities data to log2, I get some NA. > > Why are there this difference? Am I wrong in does an analysis with not > loged data? > > Thanks a lot > > Marcelo > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor -- Kasper Daniel Hansen, Research Assistant Department of Biostatistics, University of Copenhagen

ADD COMMENT • link 20.9 years ago Kasper Daniel Hansen ▴ 630

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 4.8 years ago

United States

The reason we take log is that there is some evidence that the variance of intensity increases with intensity. This messes up statistical methods such as t-tests, ANOVA, limma, rank tests like Wilcoxon and permutation tests SAM, which assume that the variance of a single gene does not depend on the expression level of the gene. Taking log removes the dependence of variance on the level when the variance increases quadratically with the intensity. If you do an MA plot of the log data, you will usually observe that on the log scale, the variance is higher for low intensity genes. This indicates that taking logarithms overcorrects. While a couple of fixes have been suggested (e.g. Churchill's work and MAANOVA ) these use transformations that are not as readily understood as logarithms. All in all, I would say analysis based on the log data is more reliable than analysis based on the raw data. If all we were interested in were tests (and not e.g. estimates of fold difference) I would probably use another variance stabilizing method - but this has not yet proved to be the case with the biologists I work with. --Naomi At 02:20 PM 4/1/2005, Marcelo Luiz de Laia wrote: >Dear Bioconductors Friends, > >I have a question that I dont found answer for it. Please, if you have a >paper/article that explain it, please, tell me. > >I normalize our data using normalize.quantile function. > >If I previous transform our intensities (single channel) in log2, I dont >get differentially genes in limma. > >But, if I dont transform our data, I get some genes with p.value around >0.0001, thats is great! > >Of course, when I transform the intensities data to log2, I get some NA. > >Why are there this difference? Am I wrong in does an analysis with not >loged data? > >Thanks a lot > >Marcelo > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 20.9 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Hi Naomi, > While a couple of fixes > have been suggested (e.g. Churchill's work and MAANOVA ) these use > transformations that are not as readily understood as logarithms. The simplicity of logarithms is somewhat of an illusion, though. There is no readily understood interpretation when the true expression of a gene in some of the conditions is zero (or close to zero). And there are many genes like that! The only sane solution that I know of is some form of shrunken log-ratios (or "generalized", "moderated", however you call it). Some prefer to do it via transformation functions that are different from the logarithm function at the lower end, some more through the backdoor by biased background estimates (to make sure all the data stay away from zero), by but the end result is similar. Best regards Wolfgang ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber

ADD REPLY • link 20.9 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

You bring up another problem. We have discussed a number of times how to handle detection calls for Affy data. (And we did not reach much conclusion, except to agree that the Affy p-value was questionable). But absent calls are also problematic for other arrays. E.g. if the transcript is absent, the P(FG>BG) should be close to .5 and so the spot will be flagged about 1/2 the time. But if the transcript is present for one condition and absent for the other, surely this is highly important - not flagged. How are peope handling this? --Naomi At 05:07 PM 4/3/2005, Wolfgang Huber wrote: >Hi Naomi, > >> While a couple of fixes have been suggested (e.g. Churchill's work and >> MAANOVA ) these use transformations that are not as readily understood >> as logarithms. > >The simplicity of logarithms is somewhat of an illusion, though. There is >no readily understood interpretation when the true expression of a gene in >some of the conditions is zero (or close to zero). And there are many >genes like that! > >The only sane solution that I know of is some form of shrunken log- ratios >(or "generalized", "moderated", however you call it). Some prefer to do it >via transformation functions that are different from the logarithm >function at the lower end, some more through the backdoor by biased >background estimates (to make sure all the data stay away from zero), by >but the end result is similar. > >Best regards > Wolfgang > >------------------------------------- >Wolfgang Huber >European Bioinformatics Institute >European Molecular Biology Laboratory >Cambridge CB10 1SD >England >Phone: +44 1223 494642 >Fax: +44 1223 494486 >Http: www.ebi.ac.uk/huber >------------------------------------- Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 20.9 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

jean wu and i have a manuscript that disucsses some of these issues. if interested, you can download it here: http://www.bepress.com/jhubiostat/paper73/ -r On Sun, 3 Apr 2005, Naomi Altman wrote: > You bring up another problem. > > We have discussed a number of times how to handle detection calls for Affy > data. (And we did not reach much conclusion, except to agree that the Affy > p-value was questionable). > > But absent calls are also problematic for other arrays. E.g. if the > transcript is absent, the P(FG>BG) should be close to .5 and so the spot will > be flagged about 1/2 the time. But if the transcript is present for one > condition and absent for the other, surely this is highly important - not > flagged. > > How are peope handling this? > > --Naomi > > At 05:07 PM 4/3/2005, Wolfgang Huber wrote: >> Hi Naomi, >> >>> While a couple of fixes have been suggested (e.g. Churchill's work and >>> MAANOVA ) these use transformations that are not as readily understood as >>> logarithms. >> >> The simplicity of logarithms is somewhat of an illusion, though. There is >> no readily understood interpretation when the true expression of a gene in >> some of the conditions is zero (or close to zero). And there are many genes >> like that! >> >> The only sane solution that I know of is some form of shrunken log- ratios >> (or "generalized", "moderated", however you call it). Some prefer to do it >> via transformation functions that are different from the logarithm function >> at the lower end, some more through the backdoor by biased background >> estimates (to make sure all the data stay away from zero), by but the end >> result is similar. >> >> Best regards >> Wolfgang >> >> ------------------------------------- >> Wolfgang Huber >> European Bioinformatics Institute >> European Molecular Biology Laboratory >> Cambridge CB10 1SD >> England >> Phone: +44 1223 494642 >> Fax: +44 1223 494486 >> Http: www.ebi.ac.uk/huber >> ------------------------------------- > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD REPLY • link 20.9 years ago Rafael A. Irizarry ★ 2.3k

0

Entering edit mode

Hi Naomi, > But absent calls are also problematic for other arrays. E.g. if the > transcript is absent, the P(FG>BG) should be close to .5 and so the spot > will be flagged about 1/2 the time. But if the transcript is present > for one condition and absent for the other, surely this is highly > important - not flagged. > > How are peope handling this? Here's another reference, which supports the sequential approach (first preprocessing, then the "higher-level analysis"). Basically the recommendation is to modify the log-ratios so that "generalized log-ratios" are shrunken towards 0 if the numbers involved are small but coincide with the usual log-ratio if they are large. "small" and "large" are automatically defined in terms of the background noise. This allows all genes in an experiment to be treated in a uniformly consistent manner, without the need for flagging small values. http://www.ebi.ac.uk/huber/docs/hvhv.pdf (?3), and also Bioinformatics. 2002;18 Suppl 1:S96-104. PMID: 12169536 There are also a number of papers by David Rocke and Blythe Durbin about this. -- Best regards Wolfgang ------------------------------------- Wolfgang Huber European Bioinformatics Institute European Molecular Biology Laboratory Cambridge CB10 1SD England Phone: +44 1223 494642 Fax: +44 1223 494486 Http: www.ebi.ac.uk/huber

ADD REPLY • link 20.9 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Rhonda DeCook ▴ 90

@rhonda-decook-1033

Last seen 11.4 years ago

With respect to permutations tests... I'm under the impression that you only need independence, not the assumption of constant variance. The permutation test provides us with a distribution of the test statistic under the null hypothesis (equal means in the 2-sample scenario, i.e. all data was generated from one distribution-even though it may be an ugly looking single distribution). As long as all 'groupings' of the data into 2 groups are equally likely (which is provided by the independence assumption) this permutation distribution of the test statistic (e.g. a t-statistic here)gives us an idea of the test statistic's distribution under the null without the assumption of normality or constant variance. Computing a permutation p-value from this null distribution provides a p-value that has the usual behavior under the null, or Uniform(0,1) though in a discrete manner. When the alternative is true, the distribution of the p-value will have more mass near zero tha the Uniform(0,1). If this logic doesn't apply to the microarray setting, please let me know. Rhonda > I just want to remind people that permutation tests, rank tests, etc still > require i.i.d. errors. So the variance needs to be stabilized even for > nonparametric tests. > > --Naomi > > At 01:32 PM 4/4/2005, Fangxin Hong wrote: > >Hi Marcelo; > >As what Wolfgang mentioned, non-parametric permutation test is an option > >when t-distribution assumption is not valid. But if you have few > >replications (2-3), most permutation tests don't have power either. I > >would suggest you try RankProd package, which would be powerful enough to > >detect differentially expressed genes with 2 replications. > > > >Bests; > >Fangxin > > > > > > > > > Hi Marcelo, > > > > > > the difference is that the power of the test you are doing can be > > > different when you consider the data on the "raw" or on the > > > log-transformed scale. > > > > > > Also, the p-value calculated by limma is based on the assumption that > > > the null-distribution of the test statistic is given by a > > > t-distribution; this assumption might be more or less true in both cases. > > > > > > You are really doing two different tests: test A, say, consists of > > > applying the t-statistic to the untransformed intensities, test B, say, > > > applying the t-statistic to the transformed intensities. > > > > > > Then, if you want to use the t-distribution for getting p-values, you > > > need to make sure that the null distribution of your test statistic > > > is indeed (to good enough approximation) t-distributed. You can do this > > > e.g. by permutations. For that you need either a large number of > > > replicates, or to pool variance estimators across genes. > > > > > > If you don't want to make a parametric assumption for getting p-values, > > > you need a larger number of replicates; if you have these, you can for > > > example calculate a permutation p-value. > > > > > > So, there is really no "right" or "wrong" about transforming, or which > > > transformation -- as long as you don't violate the assumptions of the > > > subsequent tests. If the assumptions are met, then the procedure with > > > the highest power is preferable. And that depends very much on your data > > > (about which you have not told us much.) > > > > > > Hope that helps. > > > > > > And here is another shameless plug: have a look at this paper: > > > Differential Expression with the Bioconductor Project > > > http://www.bepress.com/bioconductor/paper7 > > > > > > Best wishes > > > Wolfgang > > > > > > Marcelo Luiz de Laia wrote: > > >> Dear Bioconductors Friends, > > >> > > >> I have a question that I dont found answer for it. Please, if you have a > > >> paper/article that explain it, please, tell me. > > >> > > >> I normalize our data using normalize.quantile function. > > >> > > >> If I previous transform our intensities (single channel) in log2, I dont > > >> get differentially genes in limma. > > >> > > >> But, if I dont transform our data, I get some genes with p.value around > > >> 0.0001, thats is great! > > >> > > >> Of course, when I transform the intensities data to log2, I get some NA. > > >> > > >> Why are there this difference? Am I wrong in does an analysis with not > > >> loged data? > > >> > > >> Thanks a lot > > >> > > >> Marcelo > > >> > > >> _______________________________________________ > > >> Bioconductor mailing list > > >> Bioconductor@stat.math.ethz.ch > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > -- > > > Best regards > > > Wolfgang > > > > > > ------------------------------------- > > > Wolfgang Huber > > > European Bioinformatics Institute > > > European Molecular Biology Laboratory > > > Cambridge CB10 1SD > > > England > > > Phone: +44 1223 494642 > > > Fax: +44 1223 494486 > > > Http: www.ebi.ac.uk/huber > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > > >-- > >Fangxin Hong, Ph.D. > >Plant Biology Laboratory > >The Salk Institute > >10010 N. Torrey Pines Rd. > >La Jolla, CA 92037 > >E-mail: fhong@salk.edu > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor@stat.math.ethz.ch > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 20.9 years ago Rhonda DeCook ▴ 90

0

Entering edit mode

Neither Rhonda nor I had it quite right. Permutation tests require "exchangeability under the null hypothesis" which means that when the null hypothesis is true, the distribution of the test statistic does not depend on the treatment to which the data are assigned. Independence is not enough - e.g. If the data in one group are iid N(a,5) and in the other group are iid N(b,25) then the permutation distribution of the t-statistic under the hypothesis a=b does not provide an appropriate null distribution. But if exchangeability under the null is true for a transformation not depending on the mean, permutation tests will be correct. --Naomi At 11:51 AM 4/5/2005, Rhonda DeCook wrote: >With respect to permutations tests... > >I'm under the impression that you only need independence, not the >assumption of >constant variance. > >The permutation test provides us with a distribution of the test statistic >under the null hypothesis (equal means in the 2-sample scenario, i.e. all >data >was generated from one distribution-even though it may be an ugly looking >single distribution). As long as all 'groupings' of the data into 2 >groups are >equally likely (which is provided by the independence assumption) this >permutation distribution of the test statistic (e.g. a t-statistic here)gives >us an idea of the test statistic's distribution under the null without the >assumption of normality or constant variance. Computing a permutation >p-value >from this null distribution provides a p-value that has the usual behavior >under the null, or Uniform(0,1) though in a discrete manner. When the >alternative is true, the distribution of the p-value will have more mass near >zero tha the Uniform(0,1). > >If this logic doesn't apply to the microarray setting, please let me know. > >Rhonda > > > > > > > I just want to remind people that permutation tests, rank tests, etc still > > require i.i.d. errors. So the variance needs to be stabilized even for > > nonparametric tests. > > > > --Naomi > > > > At 01:32 PM 4/4/2005, Fangxin Hong wrote: > > >Hi Marcelo; > > >As what Wolfgang mentioned, non-parametric permutation test is an option > > >when t-distribution assumption is not valid. But if you have few > > >replications (2-3), most permutation tests don't have power either. I > > >would suggest you try RankProd package, which would be powerful enough to > > >detect differentially expressed genes with 2 replications. > > > > > >Bests; > > >Fangxin > > > > > > > > > > > > > Hi Marcelo, > > > > > > > > the difference is that the power of the test you are doing can be > > > > different when you consider the data on the "raw" or on the > > > > log-transformed scale. > > > > > > > > Also, the p-value calculated by limma is based on the assumption that > > > > the null-distribution of the test statistic is given by a > > > > t-distribution; this assumption might be more or less true in both > cases. > > > > > > > > You are really doing two different tests: test A, say, consists of > > > > applying the t-statistic to the untransformed intensities, test B, say, > > > > applying the t-statistic to the transformed intensities. > > > > > > > > Then, if you want to use the t-distribution for getting p-values, you > > > > need to make sure that the null distribution of your test statistic > > > > is indeed (to good enough approximation) t-distributed. You can do this > > > > e.g. by permutations. For that you need either a large number of > > > > replicates, or to pool variance estimators across genes. > > > > > > > > If you don't want to make a parametric assumption for getting p-values, > > > > you need a larger number of replicates; if you have these, you can for > > > > example calculate a permutation p-value. > > > > > > > > So, there is really no "right" or "wrong" about transforming, or which > > > > transformation -- as long as you don't violate the assumptions of the > > > > subsequent tests. If the assumptions are met, then the procedure with > > > > the highest power is preferable. And that depends very much on your > data > > > > (about which you have not told us much.) > > > > > > > > Hope that helps. > > > > > > > > And here is another shameless plug: have a look at this paper: > > > > Differential Expression with the Bioconductor Project > > > > http://www.bepress.com/bioconductor/paper7 > > > > > > > > Best wishes > > > > Wolfgang > > > > > > > > Marcelo Luiz de Laia wrote: > > > >> Dear Bioconductors Friends, > > > >> > > > >> I have a question that I dont found answer for it. Please, if you > have a > > > >> paper/article that explain it, please, tell me. > > > >> > > > >> I normalize our data using normalize.quantile function. > > > >> > > > >> If I previous transform our intensities (single channel) in log2, > I dont > > > >> get differentially genes in limma. > > > >> > > > >> But, if I dont transform our data, I get some genes with p.value > around > > > >> 0.0001, thats is great! > > > >> > > > >> Of course, when I transform the intensities data to log2, I get > some NA. > > > >> > > > >> Why are there this difference? Am I wrong in does an analysis with not > > > >> loged data? > > > >> > > > >> Thanks a lot > > > >> > > > >> Marcelo > > > >> > > > >> _______________________________________________ > > > >> Bioconductor mailing list > > > >> Bioconductor@stat.math.ethz.ch > > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > > > -- > > > > Best regards > > > > Wolfgang > > > > > > > > ------------------------------------- > > > > Wolfgang Huber > > > > European Bioinformatics Institute > > > > European Molecular Biology Laboratory > > > > Cambridge CB10 1SD > > > > England > > > > Phone: +44 1223 494642 > > > > Fax: +44 1223 494486 > > > > Http: www.ebi.ac.uk/huber > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor@stat.math.ethz.ch > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > > > > > > > > > > > > > >-- > > >Fangxin Hong, Ph.D. > > >Plant Biology Laboratory > > >The Salk Institute > > >10010 N. Torrey Pines Rd. > > >La Jolla, CA 92037 > > >E-mail: fhong@salk.edu > > > > > >_______________________________________________ > > >Bioconductor mailing list > > >Bioconductor@stat.math.ethz.ch > > >https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Bioinformatics Consulting Center > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 (Statistics) > > University Park, PA 16802-2111 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 20.9 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Dear Fangxin Hong, Gordon Smyth, Kasper Hansen, Naomi Altman, Rhonda DeCook, Wolfgang Huber, and others Bioconductor's Friend. I thank you kindly! Yours posts is very important for me! Since that I send the first message with this topic and received the first reply, I had that to take off a time to read about quantiles, vsn, amongst others subjects. Although I not to have understood some theories, the messages and comments had forced me to study it. Mr. Smyth wrote something that called my attention: background correction. Then, he found out my mistake. Previously, I had only attention to the negative values: I had transformed they into "0". Then, after the transformation in log2, all values "0" had been transformed into "NA". Therefore, I was using *normalize.quantiles* in a matrix with values "NA". After the email of Mr. Gordon, I transformed the negative values and zeros into half of the minimum value (backgroundCorrect method = minimum) in each column (each Array). I do this in OOCalc, because I was work with a matrix, not a RGList. After that, I got significant values with raw data and with the log2 data transformed, although the results to be slightly different. Now I have other questions, but I will do it in another topic. Thanks a lot! Marcelo

ADD REPLY • link 20.9 years ago Marcelo Luiz de Laia ▴ 770

Login before adding your answer.