ttest or fold change
14
0
Entering edit mode
Jason Hipp ▴ 40
@jason-hipp-557
Last seen 6.6 years ago
I am comparinga relatively homogeneous cell culture to another that has been treated, and am using RMA. I only have 3 replicates of each. Would you recommend a 2 tailed equal variance t test? I also thought I read that with such few replicates, a fold change would be better than a t test? If I get a t test of .0001, and a fold change of 1.2, is this a reliable change using RMA? Thanks, Jason
• 2.6k views
0
Entering edit mode
@rafael-a-irizarry-205
Last seen 6.6 years ago
in my experience with affymetrix microarryas the t-test is close to power-less with only 3 replicates. fold-change works much better. things like sam (see siggenes package for sam-stat and limma package for sam- like stats) are much better as well. if you do use a t-test make sure to look at a volcano plot (-log p-value versus average log fold change). typically you see many many genes with very small pvalues but log fold-chages very close to 0. these are likely not of interest (the denominator of t-test was small by chance). On Sat, 13 Dec 2003, Jason Hipp wrote: > I am comparinga relatively homogeneous cell culture to another that has been treated, and am using RMA. > > I only have 3 replicates of each. Would you recommend a 2 tailed equal variance t test? > I also thought I read that with such few replicates, a fold change would be better than a t test? > If I get a t test of .0001, and a fold change of 1.2, is this a reliable change using RMA? > > Thanks, > Jason > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
0
Entering edit mode
A.J. Rossini ▴ 810
@aj-rossini-209
Last seen 6.6 years ago
"Jason Hipp" <jhipp@wfubmc.edu> writes: > I am comparinga relatively homogeneous cell culture to another that has been treated, and am using RMA. > > I only have 3 replicates of each. Would you recommend a 2 tailed equal variance t test? > I also thought I read that with such few replicates, a fold change would be better than a t test? > If I get a t test of .0001, and a fold change of 1.2, is this a reliable change using RMA? You need to consider: 1. what is that value compared to the others from the experiment 2. are you doing an exploratory analysis to be confirmed later or is this part of the final scientific justification? 3. you can't treat it like a black box. Context-free science is pretty much content-free (i.e. what cell culture, what treatment, apriori should the differences be large, etc.. etc...) If you are just generating hypotheses, probably reasonable, based on MY ASSUMPTIONS. Of course, you are probably violating those, since I'm not telling you what they are and you have no clue as to whether I'm reasonable or insane this morning. (think about that for a bit and re-read what you wrote... how can we know what you've done from the above? how can you even come close to justifying equal variance? Are you just looking for a way to order the results?) best, -tony -- rossini@u.washington.edu http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}
0
Entering edit mode
Ramon Diaz ★ 1.1k
@ramon-diaz-159
Last seen 6.6 years ago
Dear Jason, First, I think you should recognize that three replicates are very few and thus conclusions will not be particularly trustworthy. I assume this is a first round of screening for relevant genes for subsequent studies. Second, I think the fold-ratio vs. t-test issue can often muddle two different questions: a) is there statistical evidence of differential expression; b) is the expression of gene X altered in a biologically relevant way (where biologically relevant means more than Z times). If you had a large number of samples you might be able to detect as "statistically significant" very small log ratio changes (which might, or might not, be biologically relevant); converseley, what if the fold change is large but the variance is huge? For reasons I don't understand, the two-fold change sometimes has a sacrosant status, but it is my understanding that other fold changes (say 1.3 or 3.5) could, on certain cases, be much more biologically relevant; this, of course, depends on the context. In your case, the t-test has an additional potential problem with the denominator. I would suggest using some procedure, such as the empirical bayes one in limma, that will use a modificied expression for the denominator, and save you from finding some very small p-values just because that gene has, by chance, an artificially small variance. So I would use limma (or something like it) and also filter by some criterion that biologists tell you is relevant for them (say, we only want genes that are overexpressed at least 5 times, or whatever). Best, R. On Saturday 13 December 2003 16:45, Jason Hipp wrote: > I am comparinga relatively homogeneous cell culture to another that has > been treated, and am using RMA. > > I only have 3 replicates of each. Would you recommend a 2 tailed equal > variance t test? I also thought I read that with such few replicates, a > fold change would be better than a t test? If I get a t test of .0001, and > a fold change of 1.2, is this a reliable change using RMA? > > Thanks, > Jason > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- Ram?n D?az-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz PGP KeyID: 0xE89B3462 (http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)
0
Entering edit mode
@michael-watson-iah-c-378
Last seen 6.6 years ago
Why not try the non-parametric t-tests available? I know all the arguments about a "loss of power" etc, but at the end of day, as statisticians and bioinformaticians, sometimes biologists come to us with small numbers of replicates (for very understandable reasons) and it is our job to get some meaning out of that data. Trying to fit any kind of statistic involving a p-value to such data is a difficult and risky task, and trying to explain those results to the biologist is often very difficult. So here's what happens with the non-parametric tests based on ranking. Those genes with the highest |t| are those where all the replicates of one condition are greater than all the replicates of the other condition. The next highest |t| is where all but one of the replicates of one condition are greater than all the replicates of the other conddition, etc etc. OK, so some of these differences could occur by chance, but we're dealing with often millions of data points and I really don't think it's possible to make no mistakes. And curse me if you like, but if i have a gene expression measurement, replicated 5 times in two conditions, and in one condition all five replicates are higher than the five replicates of the other condition, then I believe that that gene is differentially expressed. And thats easy to find with non- parametric t, and it is easy to explain to a biologist, and at the end of the day, is it really wrong to do that? -----Original Message----- From: Ramon Diaz-Uriarte [mailto:rdiaz@cnio.es] Sent: 15 December 2003 11:54 To: Jason Hipp; bioconductor@stat.math.ethz.ch Subject: Re: [BioC] ttest or fold change Dear Jason, First, I think you should recognize that three replicates are very few and thus conclusions will not be particularly trustworthy. I assume this is a first round of screening for relevant genes for subsequent studies. Second, I think the fold-ratio vs. t-test issue can often muddle two different questions: a) is there statistical evidence of differential expression; b) is the expression of gene X altered in a biologically relevant way (where biologically relevant means more than Z times). If you had a large number of samples you might be able to detect as "statistically significant" very small log ratio changes (which might, or might not, be biologically relevant); converseley, what if the fold change is large but the variance is huge? For reasons I don't understand, the two-fold change sometimes has a sacrosant status, but it is my understanding that other fold changes (say 1.3 or 3.5) could, on certain cases, be much more biologically relevant; this, of course, depends on the context. In your case, the t-test has an additional potential problem with the denominator. I would suggest using some procedure, such as the empirical bayes one in limma, that will use a modificied expression for the denominator, and save you from finding some very small p-values just because that gene has, by chance, an artificially small variance. So I would use limma (or something like it) and also filter by some criterion that biologists tell you is relevant for them (say, we only want genes that are overexpressed at least 5 times, or whatever). Best, R. On Saturday 13 December 2003 16:45, Jason Hipp wrote: > I am comparinga relatively homogeneous cell culture to another that has > been treated, and am using RMA. > > I only have 3 replicates of each. Would you recommend a 2 tailed equal > variance t test? I also thought I read that with such few replicates, a > fold change would be better than a t test? If I get a t test of .0001, and > a fold change of 1.2, is this a reliable change using RMA? > > Thanks, > Jason > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- Ram?n D?az-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz PGP KeyID: 0xE89B3462 (http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc) _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
0
Entering edit mode
@michael-newton-456
Last seen 6.6 years ago
Hi, My own calculations have also shown a lack of sensitivity of both t-testing and other approaches when we have few replicates. You might be interested in a mixture approach that seems promising. See http://www.stat.wisc.edu/~newton/papers/abstracts/tr1074a.html for code and a paper. Michael Newton p.s. That site contains a major revision of the report I released last January, with code etc recently updated; aiming for Bioconductor soon! On Sat, 13 Dec 2003, Jason Hipp wrote: > I am comparinga relatively homogeneous cell culture to another that has been treated, and am using RMA. > > I only have 3 replicates of each. Would you recommend a 2 tailed equal variance t test? > I also thought I read that with such few replicates, a fold change would be better than a t test? > If I get a t test of .0001, and a fold change of 1.2, is this a reliable change using RMA? > > Thanks, > Jason > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
0
Entering edit mode
You may try LPE (under developmental packages on Bioconductor), which is suited for significance analysis for low number of replicates. Regards, -Nitin > > Hi, > > My own calculations have also shown a lack of sensitivity of both > t-testing and other approaches when we have few replicates. You > might be interested in a mixture approach that seems promising. > See http://www.stat.wisc.edu/~newton/papers/abstracts/tr1074a.html > for code and a paper. > > Michael Newton > > p.s. That site contains a major revision of the report I released > last January, with code etc recently updated; aiming for > Bioconductor soon! > > > On Sat, 13 Dec 2003, Jason Hipp wrote: > > > I am comparinga relatively homogeneous cell culture to another that has been treated, and am using RMA. > > > > I only have 3 replicates of each. Would you recommend a 2 tailed equal variance t test? > > I also thought I read that with such few replicates, a fold change would be better than a t test? > > If I get a t test of .0001, and a fold change of 1.2, is this a reliable change using RMA? > > > > Thanks, > > Jason > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > >
0
Entering edit mode
@baker-stephen-469
Last seen 6.6 years ago
0
Entering edit mode
Dr. Baker, You wrote about "the problem" that the t-test denominator may be accidentally "too small". You say that this issue has been solved within the T-test. It is my belief that this problem has only been partially solved. It is true that this "problem" has been solved for a single hypothesis test within the T-test, but it has not been solved for microarray data analysis as a whole. It is possible to gain power by using local estimates of variance based upon more than one gene. This sort of approach is extremely useful for experiments with only a few replicates because it deals with the situation where the within group variance for a single gene happens to be very small. This is the approach implemented in Cyber-T; http://visitor.ics.uci.edu/genex/cybert/. By looking at the dataset as a whole, rather than 1 gene at a time, it is possible to eliminate false-positives that arise as a result of coincidentally low within group variance. Do you agree? Other than this minor point I think you did a wonderful job putting the statistical concepts that so many struggle with into words. Garrett Frampton Research Associate Boston University School of Medicine - Microarray Resource -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor-bounces@stat.math.ethz.ch] On Behalf Of Baker, Stephen Sent: Monday, December 15, 2003 2:15 PM To: bioconductor@stat.math.ethz.ch Subject: RE: [BioC] ttest or fold change Second, With respect to t-Tests a couple of people have mentioned "the problem" that the t-test denominator may be accidentally "too small" . This is because the t-test uses an ESTIMATE of the variance from the sample itself. This is what William Sealey Gossett, otherwise known as "Student" discovered that prompted him to develop the t-distribution and t-test. Gossett or Student was a brewmaster for Guinness breweries in Dublin and was doing experiments with hops and things and discovered that the well known "normal distribution" was inaccurate when you estimated the variance from a sample. He developed the t-distribution empirically that takes the variability in the variance estimate into account so that the t-test is ALREADY ADJUSTED to compensate for weird values in the denominator due to random sampling. One thing that I think is too often ignored is that different genes have different variances, the fact that one gene appears to have a smaller variance than its neighbors (or a larger one) could be that it ACTUALLY DOES have a larger or smaller variance OR it may be due to sampling variability. The t-test assumes the former but adjusts for the latter possiblity. It worked then and it works now, it is NOT a problem. Student's friend, the genius R.A.Fisher took Student's empirical result and worked out the theory on which analysis of variance is all based. This theory has withstood the test of time, it is about 100 years old and still holds, given the assumptions are correct, t-tests and ANOVA are still "uniformly most powerful tests". -.- -.. .---- .--. ..-. Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625 Sr. Biostatistician- Information Services Lecturer in Biostatistics (775) 254-4885 fax Graduate School of Biomedical Sciences University of Massachusetts Medical School, Worcester 55 Lake Avenue North stephen.baker@umassmed.edu Worcester, MA 01655 USA ------------------------------ .Message: 3 .Date: Mon, 15 Dec 2003 12:11:43 -0000 .From: "michael watson (IAH-C)" <michael.watson@bbsrc.ac.uk> .Subject: RE: [BioC] ttest or fold change .To: bioconductor@stat.math.ethz.ch .Message-ID: . <20B7EB075F2D4542AFFAF813E98ACD93028224D1@cl-exsrv1.irad.bbsrc.ac.uk> .Content-Type: text/plain; charset="utf-8" . . .Why not try the non-parametric t-tests available? . .I know all the arguments about a "loss of power" etc, but at the end of day, as statisticians and bioinformaticians, .sometimes biologists come to us with small numbers of replicates (for very understandable reasons) and it is our job to get .some meaning out of that data. Trying to fit any kind of statistic involving a p-value to such data is a difficult and .risky task, and trying to explain those results to the biologist is often very difficult. . .So here's what happens with the non-parametric tests based on ranking. Those genes with the highest |t| are those where .all the replicates of one condition are greater than all the replicates of the other condition. The next highest |t| is .where all but one of the replicates of one condition are greater than all the replicates of the other conddition, etc etc. . .OK, so some of these differences could occur by chance, but we're dealing with often millions of data points and I really .don't think it's possible to make no mistakes. And curse me if you like, but if i have a gene expression measurement, .replicated 5 times in two conditions, and in one condition all five replicates are higher than the five replicates of the .other condition, then I believe that that gene is differentially expressed. And thats easy to find with non- parametric t, .and it is easy to explain to a biologist, and at the end of the day, is it really wrong to do that? _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
0
Entering edit mode
@michael-watson-iah-c-378
Last seen 6.6 years ago
>This seems small but with a microarray with thousands of genes, this >easily produces a bunch of false positives. I looked at 10 chips from a >real control group arbitrarily labeling 5 chips as control and 5 as >experimental. I would by theory expect 35 false positives and got >exactly 32, that is 32 sitations in which all the low ranks were in one >group and the high ranks in the other. For a chip with 22000 genes, you >would expect 175 false positive results by this criteria. Standard >statistical methods would give you a specified type I error rate that >you can count on, it would have found NONE of the genes significant >(i.e. bonferroni adjustment) A truly excellent reply, and one which I will no doubt refer to frequently; I am still very much a novice statistician. However, and please correct me if I am wrong, but I presume that some scientists are equally afraid of false negatives as false positives? i.e. that if we are so conservative such that we try to ENSURE that there are NO false positives, we may throw away genes as not differentially expressed when in reality they are? It will be interesting to have a discussion on this - is it possible, using statistics, to guarentee both no false positives and no false negatives? If not, then surely the investigator must decide which is relevant to the study in question before going on to decide which stats to use.
0
Entering edit mode
@baker-stephen-469
Last seen 6.6 years ago
RE: [BioC] ttest or fold changeOf course investigators don't want false negatives as well as false positives but you can promise neither no false positives nor false negatives except for the trivial case when one either classifies 100% as positive or negative. The best you can do is to quantify the probabilities, then trade off one for the other, i.e. decreasing the probability of one type of error increases the probability of the other. However, as there is an arbitrary but real defacto standard for type I error of 5% which limits how much the tradeoff can be manipulated. New approaches such as mixture models offer some promise of improvement and use of the False Discovery Rate can make a very big difference in the number of regulated genes detected. I think this is underutilized. Fortunately the REAL answer is under the control of the investigator with the help of the statistician. That is the process of power analysis, i.e. the statistician can help the investigator calculate the number of microarrays that are needed to provide a desired probability of detecting a specified size effect (in fold changes). There is no effect that is too subtle to detect with enough data. Of course, there is no such thing as a free lunch and microarrays are still expensive (but getting cheaper), but then again if one thinks of the cost of any other technology, microarrays are incredibly inexpensive considering the amount of data they produce. Imagine the cost in materials and labor to do PCR on 10,000 or 20,000 genes! The studies we are seeing are getting larger and larger. Funding agencies are funding well prepared proposals for large studies with many microarrays (i.e. enough to detect meaningful effects) based on small studies of a few microarrays. These small studies are then pilot studies and pilot studies do not need to be "definitive" to be useful. They just may not be publishable on their own. -.- -.. .---- .--. ..-. Stephen P. Baker, MScPH , PhD(ABD) (508) 856-2625 Senior Biostatistician (775) 254-4885 fax Academic Computing Services Lecturer in Biostatistics , Graduate School of Biomedical Sciences University of Massachusetts Medical School 55 Lake Avenue North stephen.baker@umassmed.edu Worcester, MA 01655 USA ----- Original Message ----- From: michael watson (IAH-C) To: Baker, Stephen ; bioconductor@stat.math.ethz.ch Sent: Tuesday, December 16, 2003 4:46 AM Subject: RE: [BioC] ttest or fold change >This seems small but with a microarray with thousands of genes, this >easily produces a bunch of false positives. I looked at 10 chips from a A truly excellent reply, and one which I will no doubt refer to frequently; I am still very much a novice statistician. However, and please correct me if I am wrong, but I presume that some scientists are equally afraid of false negatives as false positives? i.e. that if we are so conservative such that we try to ENSURE that there are NO false positives, we may throw away genes as not differentially expressed when in reality they are? It will be interesting to have a discussion on this - is it possible, using statistics, to guarentee both no false positives and no false negatives? If not, then surely the investigator must decide which is relevant to the study in question before going on to decide which stats to use. [[alternative HTML version deleted]]
0
Entering edit mode
@stephen-henderson-71
Last seen 4.0 years ago
Yes This is exactly the idea behind the False Discovery Rate(FDR) algorithms that adjust p-values that you can find described in both the multtest and Limma packages. A truly excellent reply, and one which I will no doubt refer to frequently; I am still very much a novice statistician. However, and please correct me if I am wrong, but I presume that some scientists are equally afraid of false negatives as false positives? i.e. that if we are so conservative such that we try to ENSURE that there are NO false positives, we may throw away genes as not differentially expressed when in reality they are? It will be interesting to have a discussion on this - is it possible, using statistics, to guarentee both no false positives and no false negatives? If not, then surely the investigator must decide which is relevant to the study in question before going on to decide which stats to use. _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor ********************************************************************** This email and any files transmitted with it are confidentia...{{dropped}}
0
Entering edit mode
Crispin Miller ★ 1.1k
@crispin-miller-264
Last seen 6.6 years ago
0
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States
0
Entering edit mode
Crispin Miller ★ 1.1k
@crispin-miller-264
Last seen 6.6 years ago
Hi Jim, > You only have to adjust for the multiple comparisons you have made, > not those you could have made. I take your point, but I'm still concerned that a hypothesis I would have accepted six months ago is now something I'd reject. If nothing else, because I have to explain why :-) I think there is a wider issue about the logical decisions that need to be made about how one groups the data in order to apply a correction... Another example: Scientist A is interested by what is up-regulated by his transcription factor. He does a real-time experiment with replicates and finds a gene with significant induction. Scientist B is also interested in the same transcription factor. She does a similar real-time experiment on a different gene (but with the same transcription factor). Does she do multiple testing correction to take into account the previous work by A? (I think this is very similar to the new-v-old chips question, but with different numbers). I did a quick search on the web and this sort of thing appears to have been discussed quite a lot before - e.g. 'To Bonferroni or not to Bonferroni...' Cabin and Mitchell (2000), which is cited a fair number of times by articles in Ecology journals... Crispin -------------------------------------------------------- This email is confidential and intended solely for the use o...{{dropped}}
0
Entering edit mode
@baker-stephen-469
Last seen 6.6 years ago
Garrett et al, The t-test (or ANOVA) does not have a problem with "accidentally too small" variances, either with one or more than one outcome of interest. The estimate of the error variance by t-tests and ANOVA is a Least Squares estimate and is the UNBIASED ESTIMATOR that is also the lower bound on the variance for the "best" (minimum variance) linear unbiased estimator (BLUE) of the effects being tested (see Graybill 1976). Some bayesian methods can generate smaller estimates of variances by biasing the estimate toward some overall measure such as the average of variances for nearby genes. These are BIASED estimates based on an assumption that a particular gene should really be like genes that are "nearby" in some sense, such as they have similar expression levels. You would have to present a lot of data to me to convince me that any randomly selected gene should have a variance like some other set of genes, especially when I have an unbiased estimate at hand that is non-controversial, requires no defense, and uses methods that have withstood 100 years of review and scrutiny. I'm familiar with shrunken estimates of effects that can have a smaller "mean squared error", but these are random effects, not variances which control the power and type I error rate. These approaches, in addition to producing biased estimates sometimes require the analyst to impose his or her own particular biases, called "prior beliefs" or "priors" on as to how much these estimates should be biased by requiring that the analyst input how much weight is given to the data from that gene and how much weight is given to the other set that the gene is supposed to "be more like". Again, it would take some pretty strong arguments to convince me that any particular analysts prior beliefs about how much the data for a gene or data from other genes should or should not be weighted. I would be concerned about how much convincing a readership, reviewer, or study group would need if they ever decide to "open the black box" and ask me to explain why such an approach is reasonable/justifiable. The program Garrett mentioned, Cyber-T, uses such an approach. To quote the Cyber-T manual "...This weighting factor IS CONTROLLED BY THE EXPERIMENTER AND WILL DEPEND ON HOW CONFIDENT THE EXPERIMENTER IS that the background variance of a closely related set of genes approximates the variance of the gene under consideration". Now if one was looking at just ONE gene, it makes sense that someone might put a lot of thought into it, have looked at a lot of similar genes or other data and come to the conclusion that a gene should be like some other genes and THEN use this approach. But this is not the case when you have 10,000 or 22,000 genes, at least not in the world I'm familiar with. I use empirical bayes methods for fitting general linear mixed models, where the priors are objective, not my own opinion. Cyber-T does offer the option of setting low confidence in the prior which is an objective prior, but the manual points out that this results in the standard Student t-test! Another feature of Cyber-T is that when you have "enough" data, the weighted approach converges into the standard t-test as well. The real problem that researchers face with microarrays is NOT that their t-test variances are too small, but that they often have insufficient sample to detect the differences they need to detect. The ready solution is to get enough data. -.- -.. .---- .--. ..-. Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625 Sr. Biostatistician- Information Services Lecturer in Biostatistics (775) 254-4885 fax Graduate School of Biomedical Sciences University of Massachusetts Medical School, Worcester 55 Lake Avenue North stephen.baker@umassmed.edu Worcester, MA 01655 USA ------------------------------ Message: 6 Date: Tue, 16 Dec 2003 10:24:31 -0500 From: "Garrett Frampton" <gmframpt@bu.edu> Subject: RE: [BioC] ttest or fold change To: <bioconductor@stat.math.ethz.ch> Message-ID: <00b801c3c3e8$b3ed2cc0$e1be299b@GARRETT> Content-Type: text/plain; charset="US-ASCII" Dr. Baker, You wrote about "the problem" that the t-test denominator may be accidentally "too small". You say that this issue has been solved within the T-test. It is my belief that this problem has only been partially solved. It is true that this "problem" has been solved for a single hypothesis test within the T-test, but it has not been solved for microarray data analysis as a whole. It is possible to gain power by using local estimates of variance based upon more than one gene. This sort of approach is extremely useful for experiments with only a few replicates because it deals with the situation where the within group variance for a single gene happens to be very small. This is the approach implemented in Cyber-T; http://visitor.ics.uci.edu/genex/cybert/. By looking at the dataset as a whole, rather than 1 gene at a time, it is possible to eliminate false-positives that arise as a result of coincidentally low within group variance. Do you agree? Other than this minor point I think you did a wonderful job putting the statistical concepts that so many struggle with into words. Garrett Frampton Research Associate Boston University School of Medicine - Microarray Resource
0
Entering edit mode
Dear Stephen, Thank you for your detailed comments. Two points: 1. It is my understanding that there are other issues at stake besides the unbiasedness of the error variance estimate, but I'll leave the technical discussion to others who are much more capable. However, the results in, for example, L?nnstedt & Speed (2002, Statistica Sinica, 12:31--46), or Smyth (2003, http://www.statsci.org/smyth/pubs/ebayes.pdf) or in Qin & Kerr at the IMA Workshop (http://www.ima.umn.edu/talks/workshops/9-29-10-3.2003/kerr/ KerrIMA.pdf), seem to indicate, with both simulated and "wet lab data", that we can do much better (in terms of false positve and false negatives) using t-like tests that combine information across genes than with the standard t-test. 2. > The real problem that researchers face with microarrays is NOT that > their t-test variances are too small, but that they often have > insufficient sample to detect the differences they need to detect. The > ready solution is to get enough data. I do agree with the general point. In a previous incarnation I used to do behavioral ecology and help field biologists with their data. It was not unheard of (in areas with a lot less funding than molecular biology) to spend two yeards in the field following some creatures to try to get decent sample sizes (and maybe one or two papers out). The answer to small sample sizes was often more field seasons, not shortcuts in the data analysis. I often interact with many molecular biologists and MDs who persevere in using tiny sample sizes for "serious stuff". This concerns me a lot (as both a statistician and a potential patient who might one day seek treatment!). Best, R. On Tuesday 16 December 2003 23:45, Baker, Stephen wrote: > Garrett et al, > > The t-test (or ANOVA) does not have a problem with "accidentally too > small" variances, either with one or more than one outcome of interest. > The estimate of the error variance by t-tests and ANOVA is a Least > Squares estimate and is the UNBIASED ESTIMATOR that is also the lower > bound on the variance for the "best" (minimum variance) linear unbiased > estimator (BLUE) of the effects being tested (see Graybill 1976). > > Some bayesian methods can generate smaller estimates of variances by > biasing the estimate toward some overall measure such as the average of > variances for nearby genes. These are BIASED estimates based on an > assumption that a particular gene should really be like genes that are > "nearby" in some sense, such as they have similar expression levels. > You would have to present a lot of data to me to convince me that any > randomly selected gene should have a variance like some other set of > genes, especially when I have an unbiased estimate at hand that is > non-controversial, requires no defense, and uses methods that have > withstood 100 years of review and scrutiny. I'm familiar with shrunken > estimates of effects that can have a smaller "mean squared error", but > these are random effects, not variances which control the power and type > I error rate. > > These approaches, in addition to producing biased estimates sometimes > require the analyst to impose his or her own particular biases, called > "prior beliefs" or "priors" on as to how much these estimates should be > biased by requiring that the analyst input how much weight is given to > the data from that gene and how much weight is given to the other set > that the gene is supposed to "be more like". Again, it would take some > pretty strong arguments to convince me that any particular analysts > prior beliefs about how much the data for a gene or data from other > genes should or should not be weighted. I would be concerned about how > much convincing a readership, reviewer, or study group would need if > they ever decide to "open the black box" and ask me to explain why such > an approach is reasonable/justifiable. > > The program Garrett mentioned, Cyber-T, uses such an approach. To quote > the Cyber-T manual "...This weighting factor IS CONTROLLED BY THE > EXPERIMENTER AND WILL DEPEND ON HOW CONFIDENT THE EXPERIMENTER IS that > the background variance of a closely related set of genes approximates > the variance of the gene under consideration". Now if one was looking > at just ONE gene, it makes sense that someone might put a lot of > thought into it, have looked at a lot of similar genes or other data and > come to the conclusion that a gene should be like some other genes and > THEN use this approach. But this is not the case when you have 10,000 > or 22,000 genes, at least not in the world I'm familiar with. > > I use empirical bayes methods for fitting general linear mixed models, > where the priors are objective, not my own opinion. Cyber-T does offer > the option of setting low confidence in the prior which is an objective > prior, but the manual points out that this results in the standard > Student t-test! Another feature of Cyber-T is that when you have > "enough" data, the weighted approach converges into the standard t-test > as well. > > The real problem that researchers face with microarrays is NOT that > their t-test variances are too small, but that they often have > insufficient sample to detect the differences they need to detect. The > ready solution is to get enough data. > > -.- -.. .---- .--. ..-. > Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625 > Sr. Biostatistician- Information Services > Lecturer in Biostatistics (775) 254-4885 fax > Graduate School of Biomedical Sciences > University of Massachusetts Medical School, Worcester > 55 Lake Avenue North stephen.baker@umassmed.edu > Worcester, MA 01655 USA > > ------------------------------ > > Message: 6 > Date: Tue, 16 Dec 2003 10:24:31 -0500 > From: "Garrett Frampton" <gmframpt@bu.edu> > Subject: RE: [BioC] ttest or fold change > To: <bioconductor@stat.math.ethz.ch> > Message-ID: <00b801c3c3e8$b3ed2cc0$e1be299b@GARRETT> > Content-Type: text/plain; charset="US-ASCII" > > Dr. Baker, > > You wrote about "the problem" that the t-test denominator may be > accidentally "too small". You say that this issue has been solved > within the T-test. It is my belief that this problem has only been > partially solved. It is true that this "problem" has been solved for a > single hypothesis test within the T-test, but it has not been solved for > microarray data analysis as a whole. > > It is possible to gain power by using local estimates of variance based > upon more than one gene. This sort of approach is extremely useful for > experiments with only a few replicates because it deals with the > situation where the within group variance for a single gene happens to > be very small. This is the approach implemented in Cyber-T; > http://visitor.ics.uci.edu/genex/cybert/. By looking at the dataset as > a whole, rather than 1 gene at a time, it is possible to eliminate > false-positives that arise as a result of coincidentally low within > group variance. > > Do you agree? > Other than this minor point I think you did a wonderful job putting the > statistical concepts that so many struggle with into words. > > > Garrett Frampton > Research Associate > Boston University School of Medicine - Microarray Resource > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor -- Ram?n D?az-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncol?gicas (CNIO) (Spanish National Cancer Center) Melchor Fern?ndez Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://bioinfo.cnio.es/~rdiaz PGP KeyID: 0xE89B3462 (http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)
0
Entering edit mode
@garrett-frampton-434
Last seen 6.6 years ago
Dr. Baker, Thank you very much for the reply. It was quite enlightening and I agreed with almost everything. Particularly the idea that the is no substitute for collecting enough data to have the power to see that changes that you are looking for. Nevertheless, it will be along time before we can get away from analyzing small datasets (3 vs 3 for example). It is often important to perform a small study in order to get preliminary data for a larger one. In fact, in most cases this would be advisable in order to get an idea of technical and biological variability prior to designing the larger study. Consequently, it is important to be able to analyze small datasets. Suppose that we have a large dataset from a study with two experimental conditions (100 vs 100). Assume that there are large, reproducible differences (many fold, many standard deviations) between the conditions for a number of genes (1-5% of the data). A T-test can be used on this dataset to define a group of differentially expressed genes. Select 3 samples at random from each group and use two statistical tests, a T-test and the Bayesian T-test implemented in Cyber-T. At any significance cut-off, the genes found to be differentially expressed by the Bayesian T-test will be in much better agreement with the genes found by a T-test from the 100 samples than the regular T-test will be. I think that this is at odds with your conclusion. GMF ----- Original Message ----- From: "Baker, Stephen" <stephen.baker@umassmed.edu> To: <bioconductor@stat.math.ethz.ch> Sent: Tuesday, December 16, 2003 5:45 PM Subject: RE: [BioC] ttest or fold change > Garrett et al, > > The t-test (or ANOVA) does not have a problem with "accidentally too > small" variances, either with one or more than one outcome of interest. > The estimate of the error variance by t-tests and ANOVA is a Least > Squares estimate and is the UNBIASED ESTIMATOR that is also the lower > bound on the variance for the "best" (minimum variance) linear unbiased > estimator (BLUE) of the effects being tested (see Graybill 1976). > > Some bayesian methods can generate smaller estimates of variances by > biasing the estimate toward some overall measure such as the average of > variances for nearby genes. These are BIASED estimates based on an > assumption that a particular gene should really be like genes that are > "nearby" in some sense, such as they have similar expression levels. > You would have to present a lot of data to me to convince me that any > randomly selected gene should have a variance like some other set of > genes, especially when I have an unbiased estimate at hand that is > non-controversial, requires no defense, and uses methods that have > withstood 100 years of review and scrutiny. I'm familiar with shrunken > estimates of effects that can have a smaller "mean squared error", but > these are random effects, not variances which control the power and type > I error rate. > > These approaches, in addition to producing biased estimates sometimes > require the analyst to impose his or her own particular biases, called > "prior beliefs" or "priors" on as to how much these estimates should be > biased by requiring that the analyst input how much weight is given to > the data from that gene and how much weight is given to the other set > that the gene is supposed to "be more like". Again, it would take some > pretty strong arguments to convince me that any particular analysts > prior beliefs about how much the data for a gene or data from other > genes should or should not be weighted. I would be concerned about how > much convincing a readership, reviewer, or study group would need if > they ever decide to "open the black box" and ask me to explain why such > an approach is reasonable/justifiable. > > The program Garrett mentioned, Cyber-T, uses such an approach. To quote > the Cyber-T manual "...This weighting factor IS CONTROLLED BY THE > EXPERIMENTER AND WILL DEPEND ON HOW CONFIDENT THE EXPERIMENTER IS that > the background variance of a closely related set of genes approximates > the variance of the gene under consideration". Now if one was looking > at just ONE gene, it makes sense that someone might put a lot of > thought into it, have looked at a lot of similar genes or other data and > come to the conclusion that a gene should be like some other genes and > THEN use this approach. But this is not the case when you have 10,000 > or 22,000 genes, at least not in the world I'm familiar with. > > I use empirical bayes methods for fitting general linear mixed models, > where the priors are objective, not my own opinion. Cyber-T does offer > the option of setting low confidence in the prior which is an objective > prior, but the manual points out that this results in the standard > Student t-test! Another feature of Cyber-T is that when you have > "enough" data, the weighted approach converges into the standard t-test > as well. > > The real problem that researchers face with microarrays is NOT that > their t-test variances are too small, but that they often have > insufficient sample to detect the differences they need to detect. The > ready solution is to get enough data. > > -.- -.. .---- .--. ..-. > Stephen P. Baker, MScPH, PhD (ABD) (508) 856-2625 > Sr. Biostatistician- Information Services > Lecturer in Biostatistics (775) 254-4885 fax > Graduate School of Biomedical Sciences > University of Massachusetts Medical School, Worcester > 55 Lake Avenue North stephen.baker@umassmed.edu > Worcester, MA 01655 USA > > ------------------------------ > > Message: 6 > Date: Tue, 16 Dec 2003 10:24:31 -0500 > From: "Garrett Frampton" <gmframpt@bu.edu> > Subject: RE: [BioC] ttest or fold change > To: <bioconductor@stat.math.ethz.ch> > Message-ID: <00b801c3c3e8$b3ed2cc0$e1be299b@GARRETT> > Content-Type: text/plain; charset="US-ASCII" > > Dr. Baker, > > You wrote about "the problem" that the t-test denominator may be > accidentally "too small". You say that this issue has been solved > within the T-test. It is my belief that this problem has only been > partially solved. It is true that this "problem" has been solved for a > single hypothesis test within the T-test, but it has not been solved for > microarray data analysis as a whole. > > It is possible to gain power by using local estimates of variance based > upon more than one gene. This sort of approach is extremely useful for > experiments with only a few replicates because it deals with the > situation where the within group variance for a single gene happens to > be very small. This is the approach implemented in Cyber-T; > http://visitor.ics.uci.edu/genex/cybert/. By looking at the dataset as > a whole, rather than 1 gene at a time, it is possible to eliminate > false-positives that arise as a result of coincidentally low within > group variance. > > Do you agree? > Other than this minor point I think you did a wonderful job putting the > statistical concepts that so many struggle with into words. > > > Garrett Frampton > Research Associate > Boston University School of Medicine - Microarray Resource > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
0
Entering edit mode
Hello all, I have been following the recent exchanges Re: ttest or fold change with a lot of interest, particularly the limitations of small sample sizes (i.e., 3 chips per treatment). The question I would like to raise is that for Affy chips, why not use the probe-level values instead of summary values for statistical tests as Chu, Weir & Wolfinger (2002, Math. Biosci. 176:35-51) suggest? It seems like you throw away a lot of power to detect differences when you summarize 14 or 20 PM probes into one summary value. I In fact, the median polish used by RMA to summarize is a linear additive model somewhat similar to the mixed model used by Chu et al.; RMA only considers the probes from one chip, while Chu et al's uses the probes from all chips, along with cell line, treatment, and interaction effects (I still use the gcrma background correction and normalization on PM values). I'm not suggesting that this is a good substitute for conducting more replicates (I, too, am from a behavioral ecology background and tend to think an adequate sample size is at least 15-20), but I think it is a way to get more accurate information on differential expression from only a few replicates. I would like to get your thoughts on whether this is or isn't a valid method for analysis and why. Thanks, Jenny Jenny Drnevich, Ph.D. Department of Animal Biology University of Illinois 515 Morrill Hall 505 S Goodwin Ave Urbana, IL 61801 USA ph: 217-244-6826 fax: 217-244-4565 e-mail: drnevich@uiuc.edu
0
Entering edit mode
I am currently preparing a manuscript discussing this issue in relation to the RMA model (or variants of). Some of this implemented in the affyPLM package. I am a little perplexed by your statement that RMA only considers probes from individual chips. Perhaps, you need to take a closer look at the model. Ben On Wed, 2003-12-17 at 09:03, Jenny Drnevich wrote: > Hello all, > > I have been following the recent exchanges Re: ttest or fold change with a > lot of interest, particularly the limitations of small sample sizes (i.e., > 3 chips per treatment). The question I would like to raise is that for Affy > chips, why not use the probe-level values instead of summary values for > statistical tests as Chu, Weir & Wolfinger (2002, Math. Biosci. 176:35-51) > suggest? It seems like you throw away a lot of power to detect differences > when you summarize 14 or 20 PM probes into one summary value. I In fact, > the median polish used by RMA to summarize is a linear additive model > somewhat similar to the mixed model used by Chu et al.; RMA only considers > the probes from one chip, while Chu et al's uses the probes from all chips, > along with cell line, treatment, and interaction effects (I still use the > gcrma background correction and normalization on PM values). I'm not > suggesting that this is a good substitute for conducting more replicates > (I, too, am from a behavioral ecology background and tend to think an > adequate sample size is at least 15-20), but I think it is a way to get > more accurate information on differential expression from only a few > replicates. I would like to get your thoughts on whether this is or isn't a > valid method for analysis and why. > > Thanks, > Jenny > > > > > > Jenny Drnevich, Ph.D. > Department of Animal Biology > University of Illinois > 515 Morrill Hall > 505 S Goodwin Ave > Urbana, IL 61801 > USA > > ph: 217-244-6826 > fax: 217-244-4565 > e-mail: drnevich@uiuc.edu > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
0
Entering edit mode
>I am a little perplexed by your statement that RMA only considers probes >from individual chips. Perhaps, you need to take a closer look at the >model. > >Ben > Oops, you're right. My brain is starting to shut down in preparation for the holidays! I haven't yet looked at affyPLM, but I will now... Jenny Drnevich, Ph.D. Department of Animal Biology University of Illinois 515 Morrill Hall 505 S Goodwin Ave Urbana, IL 61801 USA ph: 217-244-6826 fax: 217-244-4565 e-mail: drnevich@uiuc.edu