Search
Question: Limma: doing multiple paired t-tests in one go...
0
11.0 years ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:
Dear Philip, You are seeing problems where really there are none. The advice from the limma User's Guide guided you correctly. Your limma session below is perfectly correct. Jim's advice was perfectly correct and helpful. The only problem is that you're insisting on treating the simple paired t-test as a reference method which you believe you have to reproduce. The aim of linear modelling in limma is to give you the best possible test, not to reproduce the test statistic you would have got if you only had half as much data. If all you want is to do is to reproduce ordinary paired t-tests, then you have to analyse Diet_1 and Diet_2 separately with separate linear models. In that case, there's no point in any further discussion. If you are willing to embrace a more general variant of the paired t-test which might be more powerful, then read on. The tests you get from linear modelling will differ from simple paired t-tests because the residual standard deviation is pooled across all your arrays. Your first 4 arrays allow you to test Treatment_Diet_1 vs Control_Diet_1 with 1 residual df. Your second set of 4 arrays allow you to test Treatment_Diet_2 vs Control_Diet_2, also with 1 residual df. When you put all 8 arrays into your linear model, the two residual df are pooled so that you get t-tests for Treatment_Diet_1 vs Control_Diet_1 and for Treatment_Diet_2 vs Control_Diet_2 on 2 residual df. This is usually a more powerful approach. The subtlety to your experiment is that you want to conduct two disconnected tests. The usual model formula in R are not tuned to this situation. The parametrisations you've been using will give you the correct results (because limma will figure it out, and will drop superfluous coefficients as necessary), but they do tend to obscure the true situation. Here is a simpler parametrisation which makes the situation more explicit: > design <- model.matrix(~0+Sibship) > design <- cbind(design,TreatmentvsControl1=c(0,1,0,1,0,0,0,0)) > design <- cbind(design,TreatmentvsControl2=c(0,0,0,0,0,1,0,1)) > design Sibship101 Sibship103 Sibship113 Sibship114 TreatmentvsControl1 TreatmentvsControl2 1 0 0 1 0 0 0 2 0 0 1 0 1 0 3 0 0 0 1 0 0 4 0 0 0 1 1 0 5 1 0 0 0 0 0 6 1 0 0 0 0 1 7 0 1 0 0 0 0 8 0 1 0 0 0 1 This makes it explicit that you there are only 6 independent coefficients in your model, not 7 as your previous design matrices have had. With this design matrix you don't even need to use contrasts. The last two coefficients already represent the two comparisons you want to make. Best wishes Gordon >Date: Thu, 29 Nov 2007 16:42:41 +0100 >From: "Groot, Philip de" <philip.degroot at="" wur.nl=""> >Subject: Re: [BioC] Limma: doing multiple paired t-tests in one go... >To: "James W. MacDonald" <jmacdon at="" med.umich.edu=""> >Cc: bioconductor at stat.math.ethz.ch > >Hello Jim, > >Thank you for your answer! Unfortunately, it does NOT solve my >problem. As a matter of fact: I already tried this myself... Let me >show the problem as follows. First, I reproduce what you emailed me: > > > sib >SibShips > 113 113 114 114 101 101 103 103 >Levels: 101 103 113 114 > > rep >[1] "Control_Diet_1" "Treatment_Diet_1" >"Control_Diet_1" "Treatment_Diet_1" >[5] "Control_Diet_2" "Treatment_Diet_2" >"Control_Diet_2" "Treatment_Diet_2" > > design <- model.matrix(~0+rep+sib) >Warning message: >In model.matrix.default(~0 + rep + sib) : > variable 'rep' converted to a factor > > design > repControl_Diet_1 repControl_Diet_2 repTreatment_Diet_1 repTreatment_Diet_2 >1 1 0 0 0 >2 0 0 1 0 >3 1 0 0 0 >4 0 0 1 0 >5 0 1 0 0 >6 0 0 0 1 >7 0 1 0 0 >8 0 0 0 1 > sib103 sib113 sib114 >1 0 1 0 >2 0 1 0 >3 0 0 1 >4 0 0 1 >5 0 0 0 >6 0 0 0 >7 1 0 0 >8 1 0 0 >attr(,"assign") >[1] 1 1 1 1 2 2 2 >attr(,"contrasts") >attr(,"contrasts")$rep >[1] "contr.treatment" >attr(,"contrasts")$sib >[1] "contr.treatment" > > >Secondly, I continue with the required Limma calculations and show >you the result for a single probe: > > > fit <- lmFit(x.norm, design) >Coefficients not estimable: sib114 > >(Note the error message. I suppose that it can be skipped because I >am not interested in this factor anyway). > > > contrast.matrix <- > makeContrasts(contrasts=c("repControl_Diet_1-repTreatment_Diet_1", > "repControl_Diet_2-repTreatment_Diet_2"), levels=design) > > contrast.matrix > Contrasts >Levels repControl_Diet_1-repTreatment_Diet_1 > repControl_Diet_1 1 > repControl_Diet_2 0 > repTreatment_Diet_1 -1 > repTreatment_Diet_2 0 > sib103 0 > sib113 0 > sib114 0 > Contrasts >Levels repControl_Diet_2-repTreatment_Diet_2 > repControl_Diet_1 0 > repControl_Diet_2 1 > repTreatment_Diet_1 0 > repTreatment_Diet_2 -1 > sib103 0 > sib113 0 > sib114 0 > > >(makes sense if you ask me) > > > fit <- contrasts.fit(fit, contrast.matrix) > > tstat.ord <- fit$coef / fit$stdev.unscaled / fit$sigma > > pvalue.ord <- 2 * pt( abs(tstat.ord), df=fit$df.residual, lower.tail=FALSE) > > tstat.ord[1,] >repControl_Diet_1-repTreatment_Diet_1 repControl_Diet_2-repTreatment_Diet_2 > -0.570853 -1.724101 > > pvalue.ord[1,] >repControl_Diet_1-repTreatment_Diet_1 repControl_Diet_2-repTreatment_Diet_2 > 0.6256902 0.2268312 > >Now let's verify this with a reference method!: > > y_after <- exprs(x.norm)[1,c(2,4)] > > y_after >A96_hA_40_113_1_final.CEL A96_hA_41_114_1_final.CEL > 7.182141 7.480890 > > y_before <- exprs(x.norm)[1,c(1,3)] > > y_before >A96_hA_09_113_1_base.CEL A96_hA_10_114_1_base.CEL > 7.230783 7.363846 > > t.verify <- t.test(y_before, y_after, paired=T) > > print(t.verify) > Paired t-test >data: y_before and y_after >t = -0.4128, df = 1, p-value = 0.7507 >alternative hypothesis: true difference in means is not equal to 0 >95 percent confidence interval: > -1.086823 1.018420 >sample estimates: >mean of the differences > -0.03420119 > >This does not agree! However, when I exclude Diet 2 in the >ExpressionSet and redo the Limma calculation, it DOES match!!: > > sib <- pData(x.norm)$sibship[c(1:4)] > > sib >SibShips > 113 113 114 114 >Levels: 101 103 113 114 > > rep <- pData(x.norm)$replicates[c(1:4)] > > rep >[1] "Control_Diet_1" "Treatment_Diet_1" >"Control_Diet_1" "Treatment_Diet_1" > > design <- model.matrix(~0+rep+sib) >Warning message: >In model.matrix.default(~0 + rep + sib) : > variable 'rep' converted to a factor > > design > repControl_Diet_1 repTreatment_Diet_1 sib103 sib113 sib114 >1 1 0 0 1 0 >2 0 1 0 1 0 >3 1 0 0 0 1 >4 0 1 0 0 1 >attr(,"assign") >[1] 1 1 2 2 2 >attr(,"contrasts") >attr(,"contrasts")$rep >[1] "contr.treatment" >attr(,"contrasts")$sib >[1] "contr.treatment" > > fit <- lmFit(x.norm[,c(1:4)], design) >Coefficients not estimable: sib103 sib114 > > contrast.matrix <- > makeContrasts(contrasts=c("repControl_Diet_1-repTreatment_Diet_1"), > levels=design) > > contrast.matrix > Contrasts >Levels repControl_Diet_1-repTreatment_Diet_1 > repControl_Diet_1 1 > repTreatment_Diet_1 -1 > sib103 0 > sib113 0 > sib114 0 > > fit <- contrasts.fit(fit, contrast.matrix) > > tstat.ord <- fit$coef / fit$stdev.unscaled / fit$sigma > > pvalue.ord <- 2 * pt( abs(tstat.ord), df=fit$df.residual, > lower.tail=FALSE) > > tstat.ord[1,] >[1] -0.412843 > > pvalue.ord[1,] >[1] 0.7507451 > >Which is in agreement with the t.test-calculation I demonstrated >previously! And this is exactly my problem: As soon as more diets >are included, I cannot (correctly) fit a model anymore that gives me >t-values which are in agreement with my reference method. Why? And >more importantly: how to create a linear model (and/or contrasts >matrix) that fixes this problem? Any help is highly appreciated! > >Kind regards, > >Philip > > > >________________________________ > >From: James W. MacDonald [mailto:jmacdon at med.umich.edu] >Sent: Thu 29-11-2007 15:07 >To: Groot, Philip de >Cc: bioconductor at stat.math.ethz.ch >Subject: Re: [BioC] Limma: doing multiple paired t-tests in one go... > > > >Hi Philip, > >Does this help? > > > sib >[1] 113 113 114 114 101 101 103 103 >Levels: 101 103 113 114 > > rep >[1] Control_Diet_1 Treatment_Diet_1 >[3] Control_Diet_1 Treatment_Diet_1 >[5] Control_Diet_2 Treatment_Diet_2 >[7] Control_Diet_2 Treatment_Diet_2 >4 Levels: Control_Diet_1 ... Treatment_Diet_2 > > design <- model.matrix(~0+rep+sib) > > design > repControl_Diet_1 repControl_Diet_2 >1 1 0 >2 0 0 >3 1 0 >4 0 0 >5 0 1 >6 0 0 >7 0 1 >8 0 0 > repTreatment_Diet_1 repTreatment_Diet_2 sib103 >1 0 0 0 >2 1 0 0 >3 0 0 0 >4 1 0 0 >5 0 0 0 >6 0 1 0 >7 0 0 1 >8 0 1 1 > sib113 sib114 >1 1 0 >2 1 0 >3 0 1 >4 0 1 >5 0 0 >6 0 0 >7 0 0 >8 0 0 >attr(,"assign") >[1] 1 1 1 1 2 2 2 >attr(,"contrasts") >attr(,"contrasts")$rep >[1] "contr.treatment" > >attr(,"contrasts")$sib >[1] "contr.treatment" > >Best, > >Jim > > > > >Groot, Philip de wrote: > > Hello All, > > > > I encountered a problem that I cannot easily solve, most probably > because my knowledge of linear models is too restricted. The > problem is that I want to do a paired t-test using limma, but that > I want to fit multiple comparisons (using different patients!) > simultanuously. The reason for this is that all .CEL-files in my > experiment are fitted and this fit is used for the eBayes() command > to maximize the advantage of using the eBayes approach. > > > > I found in the bioconductor mailing list a somewhat related topic: > > > > > https://stat.ethz.ch/pipermail/bioconductor/2007-February/016123.htm > l <https: stat.ethz.ch="" pipermail="" bioconductor="" 2007-february="" 016123.="" html=""> > > > > However, my problem is different. Instead of having multiple > treatments over the same patients, I have multiple treatments over > multiple patients (but still can do a paired t-test because before > and after a single treatment is done on the same person). > > > > For simplicity, let's assume that I have 2 diets and 2 patients > for each diet. My pData(x.norm) looks like this: > > > > sample replicates sibship > > A96_hA_09_113_1_base.CEL 1 Control_Diet_1 113 > > A96_hA_40_113_1_final.CEL 2 Treatment_Diet_1 113 > > A96_hA_10_114_1_base.CEL 3 Control_Diet_1 114 > > A96_hA_41_114_1_final.CEL 4 Treatment_Diet_1 114 > > A96_hA_01_101_2_base.CEL 5 Control_Diet_2 101 > > A96_hA_32_101_2_final.CEL 6 Treatment_Diet_2 101 > > A96_hA_02_103_2_base.CEL 7 Control_Diet_2 103 > > A96_hA_33_103_2_final.CEL 8 Treatment_Diet_2 103 > > > > My design matrix (for a paired t-test) is calculated as follows > (from the Limma user guide): > > Replicates <- factor(pData(x.norm)$replicates) > > SibShip <- factor(pData(x.norm)$sibship) > > design <- model.matrix(~SibShip+Replicates) > > > > And the design matrix looks like this: > > (Intercept) SibShip103 SibShip113 SibShip114 ReplicatesControl_Diet_2 > > 1 1 0 1 0 0 > > 2 1 0 1 0 0 > > 3 1 0 0 1 0 > > 4 1 0 0 1 0 > > 5 1 0 0 0 1 > > 6 1 0 0 0 0 > > 7 1 1 0 0 1 > > 8 1 1 0 0 0 > > ReplicatesTreatment_Diet_1 ReplicatesTreatment_Diet_2 > > 1 0 0 > > 2 1 0 > > 3 0 0 > > 4 1 0 > > 5 0 0 > > 6 0 1 > > 7 0 0 > > 8 0 1 > > attr(,"assign") > > [1] 0 1 1 1 2 2 2 > > attr(,"contrasts") > > attr(,"contrasts")$SibShip > > [1] "contr.treatment" > > attr(,"contrasts")$Replicates > > [1] "contr.treatment" > > > > > > As you can image, the comparisons I am interested in are > Control_Diet_1-Treatment_Diet_1 and > Control_Diet_2-Treatment_Diet_2. I might also be interested in > Control_Diet_1-Control_Diet_2 and > Treatment_Diet_1-Treatment_Diet_2, and so forth. My problem is that > the current design matrix is rather complicated and that multiple > interaction effects are somehow included, i.e. I cannot get > individual effects by simply subtracting two factors in the design > matrix (as I understand it). My question is: how can I create a > contrast matrix that gives me the comparisons I am interested in? I > am really looking forward to an answer! > > > > Kind Regards, > > > > Dr. Philip de Groot > > Wageningen University > > > > > >-- >James W. MacDonald, M.S. >Biostatistician >Affymetrix and cDNA Microarray Core >University of Michigan Cancer Center >1500 E. Medical Center Drive >7410 CCGC >Ann Arbor MI 48109 >734-647-5623