Question: Limma, model with several factors
0
6.1 years ago by
Jenny Drnevich1.9k
United States
Jenny Drnevich1.9k wrote:
Hi Gordon, I'm confused by your response to Ingrid last May as to how to construct a design matrix for her 2 factor experiment with repeated measures on 1 factor. I came across this post in searching the mailing list because I have a similar experimental design, but I was getting the dreaded "Coefficients not estimable" warning message with what I thought was the same design matrix you suggested to Ingrid. Your example below actually produces a singular design matrix: #reproducible example: > Subject <- gl(4,2) > treatment <- gl(2,2,8) > timepoint <- gl(2,1,8) > Treat.Time <- factor(paste(treatment,timepoint,sep=".")) > design <- model.matrix(~0+Treat.Time+Subject) > colnames(design)[1:4] <- levels(Treat.Time) > design 1.1 1.2 2.1 2.2 Subject2 Subject3 Subject4 1 1 0 0 0 0 0 0 2 0 1 0 0 0 0 0 3 0 0 1 0 1 0 0 4 0 0 0 1 1 0 0 5 1 0 0 0 0 1 0 6 0 1 0 0 0 1 0 7 0 0 1 0 0 0 1 8 0 0 0 1 0 0 1 attr(,"assign") [1] 1 1 1 1 2 2 2 attr(,"contrasts") attr(,"contrasts")$Treat.Time [1] "contr.treatment" attr(,"contrasts")$Subject [1] "contr.treatment" > > is.fullrank(design) [1] FALSE > > nonEstimable(design) [1] "Subject4" My searches of the mailing list led to a previous question also about the same experimental design, in which your response seems to indicate that you can't treat Subject as a fixed effect: https://stat.ethz.ch/pipermail/bioconductor/2010-March/032523.html Is this really true? I found this page (http://ww2.coastal.edu/kingw/statistics/R-tutorials/repeated.html) about modeling repeated measures factors in R, and it seems you could model a single gene like this: #not-reproducible > aov.out = aov(expression.level ~ treatment * timepoint + Error(Subject/timepoint), data=one.gene) But this model formulation doesn't work with model.matrix(). So I'm confused about the best way to model my experiment: I have 11 subjects divided into 4 treatments with 3 timepoint measurements per subject and we're interested in treatment effects, time effects and interaction effects. I'm worried about simply treating Subject as a random effect because when I look separately within each treatment, only 1 of the 4 treatments actually has any substantial Subject effect! (Subject correlation from duplicateCorrelation() for the entire experiment is 0.038, but within each treatment the values are -0.036, -0.042, -0.041 and 0.110). If none of them had an observable Subject effect, I could just ignore it and do a standard 2-factor ANOVA design, but because only one treatment has an effect I'm worried the overall correlation value over-estimates the Subject effect in 3 treatments and under-estimates in the 1 treatment. Can I possibly drop some of the Subject coefficients from model.matrix(~0+Treat.Time+Subject) and only keep those pertaining to the treatment that does show Subject effects? I would greatly appreciate any advice or suggestions! Jenny -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of Gordon K Smyth Sent: Monday, May 20, 2013 7:29 PM To: Ingrid Dahlman [guest] Cc: Bioconductor mailing list Subject: [BioC] Limma, model with several factors Dear Ingrid, If you block on subject (as in Section 8.4 of the User's Guide), then you have automatically blocked on center as well, because the subjects are different in the two centers. You experiment can be analyzed as suggested in Section 8.4.2 of the User's Guide. First combine treatment and timepoint into one factor (following Section 8.5.2): Treat.Time <- factor(paste(treatment,timepoint,sep=".")) Then make a design matrix including the blocking factor: design <- model.matrix(~0+Treat.Time+Subject) colnames(design)[1:4] <- levels(Treat.Time) cont.matrix <- makeContrasts((F.A-F.B)-(P.A-P.B),levels=design) etc. The rule is that the experimental factors of interest are combined into the treatment factors, whereas the nuisance factors are blocked on. Best wishes Gordon > Date: Mon, 20 May 2013 01:01:08 -0700 (PDT) > From: "Ingrid Dahlman [guest]" <guest at="" bioconductor.org=""> > To: bioconductor at r-project.org, ingrid.dahlman at ki.se > Subject: [BioC] Limma, model with several factors > > > I have carefully read the Limma users guide but still have not sorted > out how I design the contrasts for the following Project. > > In want to compare the effect (After vs before) of two different > treatments, F and P. The study was carried our in two different centers. > This can be illustrated as follows: > Subject center treatment timepoint > 1 1 F B > 1 1 F A > 2 1 P B > 2 1 P A > 3 2 F B > 3 2 F A > 4 2 P B > 4 2 P A > I want to compare (F.A-F.B)-(P.A-P.B), blocking for subject. However, > in addition, I would like to block for center. I.e. the center is like > a batch effect. > Is it possible to block for two factors, suject and center, in the > same test in Limma? > /Ingrid > > -- output of sessionInfo(): > > See section 8.7 in the user guide. > > -- > Sent via the guest posting facility at bioconductor.org. > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:9}}
limma • 3.2k views
modified 6.1 years ago by Gordon Smyth38k • written 6.1 years ago by Jenny Drnevich1.9k
Answer: Limma, model with several factors
0
6.1 years ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:
Dear Jenny, First, let me comment on your experiment. I think that you are over-analysing to some extent. There are strong apriori reasons to expect repeated measures on the same subject to be correlated, so I think you should include this in your model even though the correlation appears to be low for your data. I don't agree with trying to estimate a repeated measures correlation for some treatment levels and not others, unless there are very special aspects to your experiment that I am not aware of. I view this as cherry picking. My advice: just use duplicateCorrelation as usual and be done with it. The correlation is small, so it will only make a modest difference compared to ignoring it, but it is the right thing to do in principle. I would not try to treat patients as fixed effects with your data, not because it cannot be done, but because it would be overly heavy-handed considering the weak intra-patient correlations. In saying this, I am assuming that you have done the usual quality assessment plots like plotMDS, have looked for outlier samples and have considered the need for array quality weights and so on. There are many designs for which the patient effect can be treated as either a fixed or a random. You give a link to a post from me from March 2013. You claim I indicated that Subject can't be treated as a fixed effect, but I did not say that. I said that treating patient as random is one solution. I did not say or imply that it cannot be treated as fixed. Now let me turn to Ingrid Dahlman's experiment. You are correct that the design matrix I suggested to Ingrid was singular. Given that design matrix, limma would have automatically fixed the singularity by removing the design column for Subject4 and hence treated Subject1 and Subject4 as if they were the same person. That analysis is obviously not quite correct, although it probably would have given reasonable results in practice. Ingrid said that she wanted to compare (F.A-F.B)-(P.A-P.B) and that she wanted to block for subject. I took that to mean that she wanted compare before (B) to after (A) using within-subject information only. That implies a fixed effect model, so I did not consider the random effects model in my reply to her. Subject center treatment timepoint 1 1 F B 1 1 F A 2 1 P B 2 1 P A 3 2 F B 3 2 F A 4 2 P B 4 2 P A The problem with my suggested model formula was that it is not possible to compare F to P within subject, because each subject receives only one of the treatments. However it is possible to estimate the contrast asked for with a custom design matrix: F.AvsB <- as.numeric( treatment=="F" & timepoint=="A") P.AvsB <- as.numeric( treatment=="P" & timepoint=="A") design <- model.matrix(~Subject+F.AvsB+P.AvsB) This is a full rank matrix with 6 columns. The comparison that Ingrid wanted is available as the contrast F.AvsB-P.AvsB. Finally, I am not sure what you want me to say about the aov() command with an Error() term in the model. That is a neat way of fitting a very special random effects model. It fits a fixed effects model but interprets it as partly random. It works for balanced univariate designs for which the moment estimators from the ANOVA table coincide with the REML estimators of the variance components. It is not applicable to Ingrid's design. The duplicateCorrelation() approach in limma fits a similar model, with added assumptions about consistency across genes, but without the need for a balanced design. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Thu, 8 Aug 2013, Zadeh, Jenny Drnevich wrote: > Hi Gordon, > > I'm confused by your response to Ingrid last May as to how to construct > a design matrix for her 2 factor experiment with repeated measures on 1 > factor. I came across this post in searching the mailing list because I > have a similar experimental design, but I was getting the dreaded > "Coefficients not estimable" warning message with what I thought was the > same design matrix you suggested to Ingrid. Your example below actually > produces a singular design matrix: > > #reproducible example: >> Subject <- gl(4,2) >> treatment <- gl(2,2,8) >> timepoint <- gl(2,1,8) >> Treat.Time <- factor(paste(treatment,timepoint,sep=".")) >> design <- model.matrix(~0+Treat.Time+Subject) >> colnames(design)[1:4] <- levels(Treat.Time) >> design > 1.1 1.2 2.1 2.2 Subject2 Subject3 Subject4 > 1 1 0 0 0 0 0 0 > 2 0 1 0 0 0 0 0 > 3 0 0 1 0 1 0 0 > 4 0 0 0 1 1 0 0 > 5 1 0 0 0 0 1 0 > 6 0 1 0 0 0 1 0 > 7 0 0 1 0 0 0 1 > 8 0 0 0 1 0 0 1 > attr(,"assign") > [1] 1 1 1 1 2 2 2 > attr(,"contrasts") > attr(,"contrasts")$Treat.Time > [1] "contr.treatment" > > attr(,"contrasts")$Subject > [1] "contr.treatment" > >> >> is.fullrank(design) > [1] FALSE >> >> nonEstimable(design) > [1] "Subject4" > > My searches of the mailing list led to a previous question also about > the same experimental design, in which your response seems to indicate > that you can't treat Subject as a fixed effect: > https://stat.ethz.ch/pipermail/bioconductor/2010-March/032523.html > > Is this really true? I found this page > (http://ww2.coastal.edu/kingw/statistics/R-tutorials/repeated.html) > about modeling repeated measures factors in R, and it seems you could > model a single gene like this: > > #not-reproducible >> aov.out = aov(expression.level ~ treatment * timepoint + Error(Subject/timepoint), data=one.gene) > > But this model formulation doesn't work with model.matrix(). So I'm > confused about the best way to model my experiment: I have 11 subjects > divided into 4 treatments with 3 timepoint measurements per subject and > we're interested in treatment effects, time effects and interaction > effects. I'm worried about simply treating Subject as a random effect > because when I look separately within each treatment, only 1 of the 4 > treatments actually has any substantial Subject effect! (Subject > correlation from duplicateCorrelation() for the entire experiment is > 0.038, but within each treatment the values are -0.036, -0.042, -0.041 > and 0.110). If none of them had an observable Subject effect, I could > just ignore it and do a standard 2-factor ANOVA design, but because only > one treatment has an effect I'm worried the overall correlation value > over-estimates the Subject effect in 3 treatments and under- estimates in > the 1 treatment. Can I possibly drop some of the Subject coefficients > from model.matrix(~0+Treat.Time+Subject) and only keep those pertaining > to the treatment that does show Subject effects? > > I would greatly appreciate any advice or suggestions! > Jenny > -----Original Message----- > From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of Gordon K Smyth > Sent: Monday, May 20, 2013 7:29 PM > To: Ingrid Dahlman [guest] > Cc: Bioconductor mailing list > Subject: [BioC] Limma, model with several factors > > Dear Ingrid, > > If you block on subject (as in Section 8.4 of the User's Guide), then you have automatically blocked on center as well, because the subjects are different in the two centers. > > You experiment can be analyzed as suggested in Section 8.4.2 of the User's Guide. > > First combine treatment and timepoint into one factor (following Section > 8.5.2): > > Treat.Time <- factor(paste(treatment,timepoint,sep=".")) > > Then make a design matrix including the blocking factor: > > design <- model.matrix(~0+Treat.Time+Subject) > colnames(design)[1:4] <- levels(Treat.Time) > cont.matrix <- makeContrasts((F.A-F.B)-(P.A-P.B),levels=design) > > etc. > > The rule is that the experimental factors of interest are combined into the treatment factors, whereas the nuisance factors are blocked on. > > Best wishes > Gordon > >> Date: Mon, 20 May 2013 01:01:08 -0700 (PDT) >> From: "Ingrid Dahlman [guest]" <guest at="" bioconductor.org=""> >> To: bioconductor at r-project.org, ingrid.dahlman at ki.se >> Subject: [BioC] Limma, model with several factors >> >> >> I have carefully read the Limma users guide but still have not sorted >> out how I design the contrasts for the following Project. >> >> In want to compare the effect (After vs before) of two different >> treatments, F and P. The study was carried our in two different centers. >> This can be illustrated as follows: > >> Subject center treatment timepoint >> 1 1 F B >> 1 1 F A >> 2 1 P B >> 2 1 P A >> 3 2 F B >> 3 2 F A >> 4 2 P B >> 4 2 P A > >> I want to compare (F.A-F.B)-(P.A-P.B), blocking for subject. However, >> in addition, I would like to block for center. I.e. the center is like >> a batch effect. > >> Is it possible to block for two factors, suject and center, in the >> same test in Limma? > >> /Ingrid >> >> -- output of sessionInfo(): >> >> See section 8.7 in the user guide. ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
Hi Gordon, Thank you for your excellent reply! I had initially thought to treat Subject as a random factor, then decided to do some more research and just confused myself more. There are a couple of borderline outlier samples, but they don't seem egregious enough to throw out. I'll try the arrayWeights and see how they are weighted. Thanks, Jenny -----Original Message----- From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU] Sent: Thursday, August 08, 2013 11:47 PM To: Zadeh, Jenny Drnevich Cc: Bioconductor mailing list; Ingrid Dahlman Subject: RE: Limma, model with several factors Dear Jenny, First, let me comment on your experiment. I think that you are over- analysing to some extent. There are strong apriori reasons to expect repeated measures on the same subject to be correlated, so I think you should include this in your model even though the correlation appears to be low for your data. I don't agree with trying to estimate a repeated measures correlation for some treatment levels and not others, unless there are very special aspects to your experiment that I am not aware of. I view this as cherry picking. My advice: just use duplicateCorrelation as usual and be done with it. The correlation is small, so it will only make a modest difference compared to ignoring it, but it is the right thing to do in principle. I would not try to treat patients as fixed effects with your data, not because it cannot be done, but because it would be overly heavy-handed considering the weak intra-patient correlations. In saying this, I am assuming that you have done the usual quality assessment plots like plotMDS, have looked for outlier samples and have considered the need for array quality weights and so on. There are many designs for which the patient effect can be treated as either a fixed or a random. You give a link to a post from me from March 2013. You claim I indicated that Subject can't be treated as a fixed effect, but I did not say that. I said that treating patient as random is one solution. I did not say or imply that it cannot be treated as fixed. Now let me turn to Ingrid Dahlman's experiment. You are correct that the design matrix I suggested to Ingrid was singular. Given that design matrix, limma would have automatically fixed the singularity by removing the design column for Subject4 and hence treated Subject1 and Subject4 as if they were the same person. That analysis is obviously not quite correct, although it probably would have given reasonable results in practice. Ingrid said that she wanted to compare (F.A-F.B)-(P.A-P.B) and that she wanted to block for subject. I took that to mean that she wanted compare before (B) to after (A) using within-subject information only. That implies a fixed effect model, so I did not consider the random effects model in my reply to her. Subject center treatment timepoint 1 1 F B 1 1 F A 2 1 P B 2 1 P A 3 2 F B 3 2 F A 4 2 P B 4 2 P A The problem with my suggested model formula was that it is not possible to compare F to P within subject, because each subject receives only one of the treatments. However it is possible to estimate the contrast asked for with a custom design matrix: F.AvsB <- as.numeric( treatment=="F" & timepoint=="A") P.AvsB <- as.numeric( treatment=="P" & timepoint=="A") design <- model.matrix(~Subject+F.AvsB+P.AvsB) This is a full rank matrix with 6 columns. The comparison that Ingrid wanted is available as the contrast F.AvsB-P.AvsB. Finally, I am not sure what you want me to say about the aov() command with an Error() term in the model. That is a neat way of fitting a very special random effects model. It fits a fixed effects model but interprets it as partly random. It works for balanced univariate designs for which the moment estimators from the ANOVA table coincide with the REML estimators of the variance components. It is not applicable to Ingrid's design. The duplicateCorrelation() approach in limma fits a similar model, with added assumptions about consistency across genes, but without the need for a balanced design. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Thu, 8 Aug 2013, Zadeh, Jenny Drnevich wrote: > Hi Gordon, > > I'm confused by your response to Ingrid last May as to how to > construct a design matrix for her 2 factor experiment with repeated > measures on 1 factor. I came across this post in searching the mailing > list because I have a similar experimental design, but I was getting > the dreaded "Coefficients not estimable" warning message with what I > thought was the same design matrix you suggested to Ingrid. Your > example below actually produces a singular design matrix: > > #reproducible example: >> Subject <- gl(4,2) >> treatment <- gl(2,2,8) >> timepoint <- gl(2,1,8) >> Treat.Time <- factor(paste(treatment,timepoint,sep=".")) >> design <- model.matrix(~0+Treat.Time+Subject) >> colnames(design)[1:4] <- levels(Treat.Time) design > 1.1 1.2 2.1 2.2 Subject2 Subject3 Subject4 > 1 1 0 0 0 0 0 0 > 2 0 1 0 0 0 0 0 > 3 0 0 1 0 1 0 0 > 4 0 0 0 1 1 0 0 > 5 1 0 0 0 0 1 0 > 6 0 1 0 0 0 1 0 > 7 0 0 1 0 0 0 1 > 8 0 0 0 1 0 0 1 > attr(,"assign") > [1] 1 1 1 1 2 2 2 > attr(,"contrasts") > attr(,"contrasts")$Treat.Time > [1] "contr.treatment" > > attr(,"contrasts")$Subject > [1] "contr.treatment" > >> >> is.fullrank(design) > [1] FALSE >> >> nonEstimable(design) > [1] "Subject4" > > My searches of the mailing list led to a previous question also about > the same experimental design, in which your response seems to indicate > that you can't treat Subject as a fixed effect: > https://stat.ethz.ch/pipermail/bioconductor/2010-March/032523.html > > Is this really true? I found this page > (http://ww2.coastal.edu/kingw/statistics/R-tutorials/repeated.html) > about modeling repeated measures factors in R, and it seems you could > model a single gene like this: > > #not-reproducible >> aov.out = aov(expression.level ~ treatment * timepoint + >> Error(Subject/timepoint), data=one.gene) > > But this model formulation doesn't work with model.matrix(). So I'm > confused about the best way to model my experiment: I have 11 subjects > divided into 4 treatments with 3 timepoint measurements per subject > and we're interested in treatment effects, time effects and > interaction effects. I'm worried about simply treating Subject as a > random effect because when I look separately within each treatment, > only 1 of the 4 treatments actually has any substantial Subject > effect! (Subject correlation from duplicateCorrelation() for the > entire experiment is 0.038, but within each treatment the values are > -0.036, -0.042, -0.041 and 0.110). If none of them had an observable > Subject effect, I could just ignore it and do a standard 2-factor > ANOVA design, but because only one treatment has an effect I'm worried > the overall correlation value over-estimates the Subject effect in 3 > treatments and under-estimates in the 1 treatment. Can I possibly drop > some of the Subject coefficients from > model.matrix(~0+Treat.Time+Subject) and only keep those pertaining to the treatment that does show Subject effects? > > I would greatly appreciate any advice or suggestions! > Jenny > -----Original Message----- > From: bioconductor-bounces at r-project.org > [mailto:bioconductor-bounces at r-project.org] On Behalf Of Gordon K > Smyth > Sent: Monday, May 20, 2013 7:29 PM > To: Ingrid Dahlman [guest] > Cc: Bioconductor mailing list > Subject: [BioC] Limma, model with several factors > > Dear Ingrid, > > If you block on subject (as in Section 8.4 of the User's Guide), then you have automatically blocked on center as well, because the subjects are different in the two centers. > > You experiment can be analyzed as suggested in Section 8.4.2 of the User's Guide. > > First combine treatment and timepoint into one factor (following > Section > 8.5.2): > > Treat.Time <- factor(paste(treatment,timepoint,sep=".")) > > Then make a design matrix including the blocking factor: > > design <- model.matrix(~0+Treat.Time+Subject) > colnames(design)[1:4] <- levels(Treat.Time) > cont.matrix <- makeContrasts((F.A-F.B)-(P.A-P.B),levels=design) > > etc. > > The rule is that the experimental factors of interest are combined into the treatment factors, whereas the nuisance factors are blocked on. > > Best wishes > Gordon > >> Date: Mon, 20 May 2013 01:01:08 -0700 (PDT) >> From: "Ingrid Dahlman [guest]" <guest at="" bioconductor.org=""> >> To: bioconductor at r-project.org, ingrid.dahlman at ki.se >> Subject: [BioC] Limma, model with several factors >> >> >> I have carefully read the Limma users guide but still have not sorted >> out how I design the contrasts for the following Project. >> >> In want to compare the effect (After vs before) of two different >> treatments, F and P. The study was carried our in two different centers. >> This can be illustrated as follows: > >> Subject center treatment timepoint >> 1 1 F B >> 1 1 F A >> 2 1 P B >> 2 1 P A >> 3 2 F B >> 3 2 F A >> 4 2 P B >> 4 2 P A > >> I want to compare (F.A-F.B)-(P.A-P.B), blocking for subject. However, >> in addition, I would like to block for center. I.e. the center is >> like a batch effect. > >> Is it possible to block for two factors, suject and center, in the >> same test in Limma? > >> /Ingrid >> >> -- output of sessionInfo(): >> >> See section 8.7 in the user guide. ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
Hi! Thank you for updating me on the analysis. I have tried to use the approach described below by Gordon but get an error message. Can you tell me what is wrong with my script below? The study design is as before Subject center treatment timepoint 1 1 A T0 1 1 A T12 2 1 B T0 2 1 B T12 3 2 A T0 3 2 A T12 4 2 B T0 4 2 B T12 (in reality I have >60 subjects) and I wish to compare A.T12vsT0 with B.T12vsT0 I write eset12.txt<-read.delim("eset12.txt",header=TRUE) target12<-read.delim("target12.txt",header=TRUE) eset12<-readExpressionSet("eset12.txt","target12.txt",header=TRUE) SUBJECT <- factor(target12$SUBJECT) TREATMENT <- factor(target12$TREATMENT, levels=c("A","B")) TIMEPOINT <- factor(target12$TIMEPOINT, levels=c("T0","T12")) A.T0vsT12 <- as.numeric(TREATMENT=="A" & TIMEPOINT=="T0") B.T0vsT12 <- as.numeric(TREATMENT=="B" & TIMEPOINT=="T0") design <- model.matrix(~SUBJECT+A.T0vsT12+B.T0vsT12) cont.matrix <- makeContrasts(Diff12T=(A.T0vsT12-B.T0vsT12),levels=design) and after entering the last row into R I get the message Warning message: In makeContrasts(Diff12T = (A.T0vsT12 - B.T0vsT12), levels = design) : Renaming (Intercept) to Intercept What does the above mean? Ingrid ***************************** Ingrid Dahlman, MD, Associate professor Karolinska Institutet Department of Medicine, Huddinge C2-94 141 86 Stockholm SWEDEN mobil: +46 73 699 5490 ________________________________________ Fr?n: Zadeh, Jenny Drnevich [drnevich at illinois.edu] Skickat: den 13 augusti 2013 16:34 Till: Gordon K Smyth Kopia: Bioconductor mailing list; Ingrid Dahlman ?mne: RE: Limma, model with several factors Hi Gordon, Thank you for your excellent reply! I had initially thought to treat Subject as a random factor, then decided to do some more research and just confused myself more. There are a couple of borderline outlier samples, but they don't seem egregious enough to throw out. I'll try the arrayWeights and see how they are weighted. Thanks, Jenny -----Original Message----- From: Gordon K Smyth [mailto:smyth@wehi.EDU.AU] Sent: Thursday, August 08, 2013 11:47 PM To: Zadeh, Jenny Drnevich Cc: Bioconductor mailing list; Ingrid Dahlman Subject: RE: Limma, model with several factors Dear Jenny, First, let me comment on your experiment. I think that you are over- analysing to some extent. There are strong apriori reasons to expect repeated measures on the same subject to be correlated, so I think you should include this in your model even though the correlation appears to be low for your data. I don't agree with trying to estimate a repeated measures correlation for some treatment levels and not others, unless there are very special aspects to your experiment that I am not aware of. I view this as cherry picking. My advice: just use duplicateCorrelation as usual and be done with it. The correlation is small, so it will only make a modest difference compared to ignoring it, but it is the right thing to do in principle. I would not try to treat patients as fixed effects with your data, not because it cannot be done, but because it would be overly heavy-handed considering the weak intra-patient correlations. In saying this, I am assuming that you have done the usual quality assessment plots like plotMDS, have looked for outlier samples and have considered the need for array quality weights and so on. There are many designs for which the patient effect can be treated as either a fixed or a random. You give a link to a post from me from March 2013. You claim I indicated that Subject can't be treated as a fixed effect, but I did not say that. I said that treating patient as random is one solution. I did not say or imply that it cannot be treated as fixed. Now let me turn to Ingrid Dahlman's experiment. You are correct that the design matrix I suggested to Ingrid was singular. Given that design matrix, limma would have automatically fixed the singularity by removing the design column for Subject4 and hence treated Subject1 and Subject4 as if they were the same person. That analysis is obviously not quite correct, although it probably would have given reasonable results in practice. Ingrid said that she wanted to compare (F.A-F.B)-(P.A-P.B) and that she wanted to block for subject. I took that to mean that she wanted compare before (B) to after (A) using within-subject information only. That implies a fixed effect model, so I did not consider the random effects model in my reply to her. Subject center treatment timepoint 1 1 F B 1 1 F A 2 1 P B 2 1 P A 3 2 F B 3 2 F A 4 2 P B 4 2 P A The problem with my suggested model formula was that it is not possible to compare F to P within subject, because each subject receives only one of the treatments. However it is possible to estimate the contrast asked for with a custom design matrix: F.AvsB <- as.numeric( treatment=="F" & timepoint=="A") P.AvsB <- as.numeric( treatment=="P" & timepoint=="A") design <- model.matrix(~Subject+F.AvsB+P.AvsB) This is a full rank matrix with 6 columns. The comparison that Ingrid wanted is available as the contrast F.AvsB-P.AvsB. Finally, I am not sure what you want me to say about the aov() command with an Error() term in the model. That is a neat way of fitting a very special random effects model. It fits a fixed effects model but interprets it as partly random. It works for balanced univariate designs for which the moment estimators from the ANOVA table coincide with the REML estimators of the variance components. It is not applicable to Ingrid's design. The duplicateCorrelation() approach in limma fits a similar model, with added assumptions about consistency across genes, but without the need for a balanced design. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Thu, 8 Aug 2013, Zadeh, Jenny Drnevich wrote: > Hi Gordon, > > I'm confused by your response to Ingrid last May as to how to > construct a design matrix for her 2 factor experiment with repeated > measures on 1 factor. I came across this post in searching the mailing > list because I have a similar experimental design, but I was getting > the dreaded "Coefficients not estimable" warning message with what I > thought was the same design matrix you suggested to Ingrid. Your > example below actually produces a singular design matrix: > > #reproducible example: >> Subject <- gl(4,2) >> treatment <- gl(2,2,8) >> timepoint <- gl(2,1,8) >> Treat.Time <- factor(paste(treatment,timepoint,sep=".")) >> design <- model.matrix(~0+Treat.Time+Subject) >> colnames(design)[1:4] <- levels(Treat.Time) design > 1.1 1.2 2.1 2.2 Subject2 Subject3 Subject4 > 1 1 0 0 0 0 0 0 > 2 0 1 0 0 0 0 0 > 3 0 0 1 0 1 0 0 > 4 0 0 0 1 1 0 0 > 5 1 0 0 0 0 1 0 > 6 0 1 0 0 0 1 0 > 7 0 0 1 0 0 0 1 > 8 0 0 0 1 0 0 1 > attr(,"assign") > [1] 1 1 1 1 2 2 2 > attr(,"contrasts") > attr(,"contrasts")$Treat.Time > [1] "contr.treatment" > > attr(,"contrasts")$Subject > [1] "contr.treatment" > >> >> is.fullrank(design) > [1] FALSE >> >> nonEstimable(design) > [1] "Subject4" > > My searches of the mailing list led to a previous question also about > the same experimental design, in which your response seems to indicate > that you can't treat Subject as a fixed effect: > https://stat.ethz.ch/pipermail/bioconductor/2010-March/032523.html > > Is this really true? I found this page > (http://ww2.coastal.edu/kingw/statistics/R-tutorials/repeated.html) > about modeling repeated measures factors in R, and it seems you could > model a single gene like this: > > #not-reproducible >> aov.out = aov(expression.level ~ treatment * timepoint + >> Error(Subject/timepoint), data=one.gene) > > But this model formulation doesn't work with model.matrix(). So I'm > confused about the best way to model my experiment: I have 11 subjects > divided into 4 treatments with 3 timepoint measurements per subject > and we're interested in treatment effects, time effects and > interaction effects. I'm worried about simply treating Subject as a > random effect because when I look separately within each treatment, > only 1 of the 4 treatments actually has any substantial Subject > effect! (Subject correlation from duplicateCorrelation() for the > entire experiment is 0.038, but within each treatment the values are > -0.036, -0.042, -0.041 and 0.110). If none of them had an observable > Subject effect, I could just ignore it and do a standard 2-factor > ANOVA design, but because only one treatment has an effect I'm worried > the overall correlation value over-estimates the Subject effect in 3 > treatments and under-estimates in the 1 treatment. Can I possibly drop > some of the Subject coefficients from > model.matrix(~0+Treat.Time+Subject) and only keep those pertaining to the treatment that does show Subject effects? > > I would greatly appreciate any advice or suggestions! > Jenny > -----Original Message----- > From: bioconductor-bounces at r-project.org > [mailto:bioconductor-bounces at r-project.org] On Behalf Of Gordon K > Smyth > Sent: Monday, May 20, 2013 7:29 PM > To: Ingrid Dahlman [guest] > Cc: Bioconductor mailing list > Subject: [BioC] Limma, model with several factors > > Dear Ingrid, > > If you block on subject (as in Section 8.4 of the User's Guide), then you have automatically blocked on center as well, because the subjects are different in the two centers. > > You experiment can be analyzed as suggested in Section 8.4.2 of the User's Guide. > > First combine treatment and timepoint into one factor (following > Section > 8.5.2): > > Treat.Time <- factor(paste(treatment,timepoint,sep=".")) > > Then make a design matrix including the blocking factor: > > design <- model.matrix(~0+Treat.Time+Subject) > colnames(design)[1:4] <- levels(Treat.Time) > cont.matrix <- makeContrasts((F.A-F.B)-(P.A-P.B),levels=design) > > etc. > > The rule is that the experimental factors of interest are combined into the treatment factors, whereas the nuisance factors are blocked on. > > Best wishes > Gordon > >> Date: Mon, 20 May 2013 01:01:08 -0700 (PDT) >> From: "Ingrid Dahlman [guest]" <guest at="" bioconductor.org=""> >> To: bioconductor at r-project.org, ingrid.dahlman at ki.se >> Subject: [BioC] Limma, model with several factors >> >> >> I have carefully read the Limma users guide but still have not sorted >> out how I design the contrasts for the following Project. >> >> In want to compare the effect (After vs before) of two different >> treatments, F and P. The study was carried our in two different centers. >> This can be illustrated as follows: > >> Subject center treatment timepoint >> 1 1 F B >> 1 1 F A >> 2 1 P B >> 2 1 P A >> 3 2 F B >> 3 2 F A >> 4 2 P B >> 4 2 P A > >> I want to compare (F.A-F.B)-(P.A-P.B), blocking for subject. However, >> in addition, I would like to block for center. I.e. the center is >> like a batch effect. > >> Is it possible to block for two factors, suject and center, in the >> same test in Limma? > >> /Ingrid >> >> -- output of sessionInfo(): >> >> See section 8.7 in the user guide. ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}} ADD REPLYlink written 6.1 years ago by Ingrid Dahlman20 Hi Ingrid, On 8/15/2013 12:12 PM, Ingrid Dahlman wrote: > Hi! > Thank you for updating me on the analysis. > I have tried to use the approach described below by Gordon but get an error message. Can you tell me what is wrong with my script below? > > The study design is as before > > Subject center treatment timepoint > 1 1 A T0 > 1 1 A T12 > 2 1 B T0 > 2 1 B T12 > 3 2 A T0 > 3 2 A T12 > 4 2 B T0 > 4 2 B T12 > (in reality I have>60 subjects) > > and I wish to compare A.T12vsT0 with B.T12vsT0 > > I write > eset12.txt<-read.delim("eset12.txt",header=TRUE) > target12<-read.delim("target12.txt",header=TRUE) > eset12<-readExpressionSet("eset12.txt","target12.txt",header=TRUE) > SUBJECT<- factor(target12$SUBJECT) > TREATMENT<- factor(target12$TREATMENT, levels=c("A","B")) > TIMEPOINT<- factor(target12$TIMEPOINT, levels=c("T0","T12")) > A.T0vsT12<- as.numeric(TREATMENT=="A"& TIMEPOINT=="T0") > B.T0vsT12<- as.numeric(TREATMENT=="B"& TIMEPOINT=="T0") > design<- model.matrix(~SUBJECT+A.T0vsT12+B.T0vsT12) > cont.matrix<- makeContrasts(Diff12T=(A.T0vsT12-B.T0vsT12),levels=design) > > and after entering the last row into R I get the message > > Warning message: > In makeContrasts(Diff12T = (A.T0vsT12 - B.T0vsT12), levels = design) : > Renaming (Intercept) to Intercept > > What does the above mean? The default in R is to label the intercept term as (Intercept), with parentheses. This warning is just telling you that it has been renamed to Intercept, without the parentheses. This won't affect your results, and is only done because the column names of your design matrix have to be syntactically valid names, and (Intercept) is not syntactically valid. Best, Jim > > Ingrid > > > ***************************** > > Ingrid Dahlman, MD, Associate professor > > Karolinska Institutet > > Department of Medicine, Huddinge > > C2-94 > > 141 86 Stockholm > > SWEDEN > > mobil: +46 73 699 5490 > > ________________________________________ > Fr?n: Zadeh, Jenny Drnevich [drnevich at illinois.edu] > Skickat: den 13 augusti 2013 16:34 > Till: Gordon K Smyth > Kopia: Bioconductor mailing list; Ingrid Dahlman > ?mne: RE: Limma, model with several factors > > Hi Gordon, > > Thank you for your excellent reply! I had initially thought to treat Subject as a random factor, then decided to do some more research and just confused myself more. There are a couple of borderline outlier samples, but they don't seem egregious enough to throw out. I'll try the arrayWeights and see how they are weighted. > > Thanks, > Jenny > > -----Original Message----- > From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU] > Sent: Thursday, August 08, 2013 11:47 PM > To: Zadeh, Jenny Drnevich > Cc: Bioconductor mailing list; Ingrid Dahlman > Subject: RE: Limma, model with several factors > > Dear Jenny, > > First, let me comment on your experiment. I think that you are over-analysing to some extent. There are strong apriori reasons to expect repeated measures on the same subject to be correlated, so I think you should include this in your model even though the correlation appears to be low for your data. I don't agree with trying to estimate a repeated measures correlation for some treatment levels and not others, unless there are very special aspects to your experiment that I am not aware of. > I view this as cherry picking. > > My advice: just use duplicateCorrelation as usual and be done with it. > The correlation is small, so it will only make a modest difference compared to ignoring it, but it is the right thing to do in principle. > > I would not try to treat patients as fixed effects with your data, not because it cannot be done, but because it would be overly heavy- handed considering the weak intra-patient correlations. > > In saying this, I am assuming that you have done the usual quality assessment plots like plotMDS, have looked for outlier samples and have considered the need for array quality weights and so on. > > There are many designs for which the patient effect can be treated as either a fixed or a random. You give a link to a post from me from March 2013. You claim I indicated that Subject can't be treated as a fixed effect, but I did not say that. I said that treating patient as random is one solution. I did not say or imply that it cannot be treated as fixed. > > Now let me turn to Ingrid Dahlman's experiment. You are correct that the design matrix I suggested to Ingrid was singular. Given that design matrix, limma would have automatically fixed the singularity by removing the design column for Subject4 and hence treated Subject1 and Subject4 as if they were the same person. That analysis is obviously not quite correct, although it probably would have given reasonable results in practice. > > Ingrid said that she wanted to compare (F.A-F.B)-(P.A-P.B) and that she wanted to block for subject. I took that to mean that she wanted compare before (B) to after (A) using within-subject information only. That implies a fixed effect model, so I did not consider the random effects model in my reply to her. > > Subject center treatment timepoint > 1 1 F B > 1 1 F A > 2 1 P B > 2 1 P A > 3 2 F B > 3 2 F A > 4 2 P B > 4 2 P A > > The problem with my suggested model formula was that it is not possible to compare F to P within subject, because each subject receives only one of the treatments. However it is possible to estimate the contrast asked for with a custom design matrix: > > F.AvsB<- as.numeric( treatment=="F"& timepoint=="A") > P.AvsB<- as.numeric( treatment=="P"& timepoint=="A") > design<- model.matrix(~Subject+F.AvsB+P.AvsB) > > This is a full rank matrix with 6 columns. The comparison that Ingrid wanted is available as the contrast F.AvsB-P.AvsB. > > Finally, I am not sure what you want me to say about the aov() command with an Error() term in the model. That is a neat way of fitting a very special random effects model. It fits a fixed effects model but interprets it as partly random. It works for balanced univariate designs for which the moment estimators from the ANOVA table coincide with the REML estimators of the variance components. It is not applicable to Ingrid's design. The duplicateCorrelation() approach in limma fits a similar model, with added assumptions about consistency across genes, but without the need for a balanced design. > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. > http://www.statsci.org/smyth > > > On Thu, 8 Aug 2013, Zadeh, Jenny Drnevich wrote: > >> Hi Gordon, >> >> I'm confused by your response to Ingrid last May as to how to >> construct a design matrix for her 2 factor experiment with repeated >> measures on 1 factor. I came across this post in searching the mailing >> list because I have a similar experimental design, but I was getting >> the dreaded "Coefficients not estimable" warning message with what I >> thought was the same design matrix you suggested to Ingrid. Your >> example below actually produces a singular design matrix: >> >> #reproducible example: >>> Subject<- gl(4,2) >>> treatment<- gl(2,2,8) >>> timepoint<- gl(2,1,8) >>> Treat.Time<- factor(paste(treatment,timepoint,sep=".")) >>> design<- model.matrix(~0+Treat.Time+Subject) >>> colnames(design)[1:4]<- levels(Treat.Time) design >> 1.1 1.2 2.1 2.2 Subject2 Subject3 Subject4 >> 1 1 0 0 0 0 0 0 >> 2 0 1 0 0 0 0 0 >> 3 0 0 1 0 1 0 0 >> 4 0 0 0 1 1 0 0 >> 5 1 0 0 0 0 1 0 >> 6 0 1 0 0 0 1 0 >> 7 0 0 1 0 0 0 1 >> 8 0 0 0 1 0 0 1 >> attr(,"assign") >> [1] 1 1 1 1 2 2 2 >> attr(,"contrasts") >> attr(,"contrasts")$Treat.Time >> [1] "contr.treatment" >> >> attr(,"contrasts")$Subject >> [1] "contr.treatment" >> >>> is.fullrank(design) >> [1] FALSE >>> nonEstimable(design) >> [1] "Subject4" >> >> My searches of the mailing list led to a previous question also about >> the same experimental design, in which your response seems to indicate >> that you can't treat Subject as a fixed effect: >> https://stat.ethz.ch/pipermail/bioconductor/2010-March/032523.html >> >> Is this really true? I found this page >> (http://ww2.coastal.edu/kingw/statistics/R-tutorials/repeated.html) >> about modeling repeated measures factors in R, and it seems you could >> model a single gene like this: >> >> #not-reproducible >>> aov.out = aov(expression.level ~ treatment * timepoint + >>> Error(Subject/timepoint), data=one.gene) >> But this model formulation doesn't work with model.matrix(). So I'm >> confused about the best way to model my experiment: I have 11 subjects >> divided into 4 treatments with 3 timepoint measurements per subject >> and we're interested in treatment effects, time effects and >> interaction effects. I'm worried about simply treating Subject as a >> random effect because when I look separately within each treatment, >> only 1 of the 4 treatments actually has any substantial Subject >> effect! (Subject correlation from duplicateCorrelation() for the >> entire experiment is 0.038, but within each treatment the values are >> -0.036, -0.042, -0.041 and 0.110). If none of them had an observable >> Subject effect, I could just ignore it and do a standard 2-factor >> ANOVA design, but because only one treatment has an effect I'm worried >> the overall correlation value over-estimates the Subject effect in 3 >> treatments and under-estimates in the 1 treatment. Can I possibly drop >> some of the Subject coefficients from >> model.matrix(~0+Treat.Time+Subject) and only keep those pertaining to the treatment that does show Subject effects? >> >> I would greatly appreciate any advice or suggestions! >> Jenny > >> -----Original Message----- >> From: bioconductor-bounces at r-project.org >> [mailto:bioconductor-bounces at r-project.org] On Behalf Of Gordon K >> Smyth >> Sent: Monday, May 20, 2013 7:29 PM >> To: Ingrid Dahlman [guest] >> Cc: Bioconductor mailing list >> Subject: [BioC] Limma, model with several factors >> >> Dear Ingrid, >> >> If you block on subject (as in Section 8.4 of the User's Guide), then you have automatically blocked on center as well, because the subjects are different in the two centers. >> >> You experiment can be analyzed as suggested in Section 8.4.2 of the User's Guide. >> >> First combine treatment and timepoint into one factor (following >> Section >> 8.5.2): >> >> Treat.Time<- factor(paste(treatment,timepoint,sep=".")) >> >> Then make a design matrix including the blocking factor: >> >> design<- model.matrix(~0+Treat.Time+Subject) >> colnames(design)[1:4]<- levels(Treat.Time) >> cont.matrix<- makeContrasts((F.A-F.B)-(P.A-P.B),levels=design) >> >> etc. >> >> The rule is that the experimental factors of interest are combined into the treatment factors, whereas the nuisance factors are blocked on. >> >> Best wishes >> Gordon >> >>> Date: Mon, 20 May 2013 01:01:08 -0700 (PDT) >>> From: "Ingrid Dahlman [guest]"<guest at="" bioconductor.org=""> >>> To: bioconductor at r-project.org, ingrid.dahlman at ki.se >>> Subject: [BioC] Limma, model with several factors >>> >>> >>> I have carefully read the Limma users guide but still have not sorted >>> out how I design the contrasts for the following Project. >>> >>> In want to compare the effect (After vs before) of two different >>> treatments, F and P. The study was carried our in two different centers. >>> This can be illustrated as follows: >>> Subject center treatment timepoint >>> 1 1 F B >>> 1 1 F A >>> 2 1 P B >>> 2 1 P A >>> 3 2 F B >>> 3 2 F A >>> 4 2 P B >>> 4 2 P A >>> I want to compare (F.A-F.B)-(P.A-P.B), blocking for subject. However, >>> in addition, I would like to block for center. I.e. the center is >>> like a batch effect. >>> Is it possible to block for two factors, suject and center, in the >>> same test in Limma? >>> /Ingrid >>> >>> -- output of sessionInfo(): >>> >>> See section 8.7 in the user guide. > ______________________________________________________________________ > The information in this email is confidential and intend...{{dropped:6}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099