Search
Question: Experimental Design, Model Matrix and Contrasts
0
10.4 years ago by
I'm getting myself confused with setting up these things for a limma analysis. Please stop me and correct me if I'm wrong with any of this! I have two factors each with two levels: Treatment - "C" and "T" Gestation - "d60" and "d67" I setup up my model and design matrix as follows: TG<-paste(pData(eSet)$Treatment, pData(eSet)$Gestation, sep=".") TG<-factor(TG,levels=unique(TG)) design<-model.matrix(~0+TG) colnames(design)<-levels(TG) which gave the following design matrix: C.d60 T.d60 C.d67 T.d67 1 1 0 0 0 2 1 0 0 0 3 1 0 0 0 4 1 0 0 0 5 0 1 0 0 6 0 1 0 0 7 0 1 0 0 8 0 1 0 0 9 0 1 0 0 10 0 0 1 0 11 0 0 1 0 12 0 0 1 0 13 0 0 1 0 14 0 0 0 1 15 0 0 0 1 16 0 0 0 1 attr(,"assign") [1] 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$TG [1] "contr.treatment" So I can use contrasts to pull out comparisons of Control Day 60 (C.d60) versus Treatment Day 60 (T.d60), Control Day 67 (C.d67) versus Treatment Day 67 (T.d67), the treatment main effect as well as the interaction terms using the following: cont.matrix<-makeContrasts( Treatment_d60 = T.d60 - C.d60, Treatment_d67 = T.d67 - C.d67, Treatment = (T.d60+T.d67) - (C.d60+C.d67), Diff = (T.d67-C.d67) - (T.d60-C.d60), levels=design ) Which gives the following contrast matrix: Contrasts Levels Treatment_d60 Treatment_d67 Treatment Diff C.d60 -1 0 -1 1 T.d60 1 0 1 -1 C.d67 0 -1 -1 -1 T.d67 0 1 1 1 I've now realised that there is contamination in some of my samples and would like to include this in the model so I can try to remove it's effect and try to increase the power to detect differentially expressed genes for the contrasts show above. However I'm not clear on how to proceed. I don't think the above approach can be extended to include the contamination effect. So I though something like this might work/be correct: Treatment<-factor(targets$Treatment, levels=unique(targets$Treatment)) Gestation<-factor(targets$Gestation, levels=unique(targets$Gestation)) Contamination<-factor(targets$Contamination, levels=unique(targets$Contamination)) design<-model.matrix(~0+Treatment*Gestation+Contamination) Which give the following design matrix: TreatmentC TreatmentT Gestationd67 Contamination1 TreatmentT:Gestationd67 1 1 0 0 0 0 2 1 0 0 1 0 3 1 0 0 0 0 4 1 0 0 1 0 5 0 1 0 0 0 6 0 1 0 1 0 7 0 1 0 1 0 8 0 1 0 1 0 9 0 1 0 0 0 10 1 0 1 0 0 11 1 0 1 0 0 12 1 0 1 0 0 13 1 0 1 1 0 14 0 1 1 0 1 15 0 1 1 0 1 16 0 1 1 0 1 attr(,"assign") [1] 1 1 2 3 4 attr(,"contrasts") attr(,"contrasts")$Treatment [1] "contr.treatment" attr(,"contrasts")$Gestation [1] "contr.treatment" attr(,"contrasts")$Contamination [1] "contr.treatment" I now have a few questions: 1) When specifying the model, what's the difference between using ~Treatment*Gestation and ~0+Treatment*Gestation and how does the alter the interpretation of the design matrix? 2) Using the above design matrix (assuming it is correct/the best way to go forward), how would I get the Treatment effect, Gestation effect, interaction, as well as the treatment effect for d60 Gestation and the treatment effect for d67? 3) Out of interest, how would I find out the genes affected by the contamination? I'm somewhat confused, so if I'm making any serious errors, please let me know! Nathan ------------------------------------------------------------- Dr. Nathan S. Watson-Haigh (publish under Haigh, N.S.) OCE Post Doctoral Fellow CSIRO Livestock Industries J M Rendel Laboratory Rockhampton QLD 4701 Tel: +61 (0)7 4923 8121 Australia Fax: +61 (0)7 4923 8222
modified 10.4 years ago • written 10.4 years ago by Nathan.Watson-Haigh@csiro.au210
0
10.4 years ago by
Mark Cowley400
Australia
Mark Cowley400 wrote:
Hi Nathan, How about adding the contamination factor to your original design, do the first lmFit, then then fit exactly the same contrasts as you have specified. The effect of contamination will be taken care of in the linear model, then you are left with 'cleaner' estimates of the C/T d60/67 parameters, which your contrasts can pick out. cheers, Mark On 17/07/2008, at 9:59 AM, <nathan.watson-haigh at="" csiro.au=""> <nathan .watson-haigh="" at="" csiro.au=""> wrote: > I'm getting myself confused with setting up these things for a limma > analysis. Please stop me and correct me if I'm wrong with any of this! > > I have two factors each with two levels: > Treatment - "C" and "T" > Gestation - "d60" and "d67" > > I setup up my model and design matrix as follows: > > TG<-paste(pData(eSet)$Treatment, pData(eSet)$Gestation, sep=".") > TG<-factor(TG,levels=unique(TG)) > design<-model.matrix(~0+TG) > colnames(design)<-levels(TG) > > which gave the following design matrix: > > C.d60 T.d60 C.d67 T.d67 > 1 1 0 0 0 > 2 1 0 0 0 > 3 1 0 0 0 > 4 1 0 0 0 > 5 0 1 0 0 > 6 0 1 0 0 > 7 0 1 0 0 > 8 0 1 0 0 > 9 0 1 0 0 > 10 0 0 1 0 > 11 0 0 1 0 > 12 0 0 1 0 > 13 0 0 1 0 > 14 0 0 0 1 > 15 0 0 0 1 > 16 0 0 0 1 > attr(,"assign") > [1] 1 1 1 1 > attr(,"contrasts") > attr(,"contrasts")$TG > [1] "contr.treatment" > > So I can use contrasts to pull out comparisons of Control Day 60 > (C.d60) > versus Treatment Day 60 (T.d60), Control Day 67 (C.d67) versus > Treatment > Day 67 (T.d67), the treatment main effect as well as the interaction > terms using the following: > > cont.matrix<-makeContrasts( > Treatment_d60 = T.d60 - C.d60, > Treatment_d67 = T.d67 - C.d67, > Treatment = (T.d60+T.d67) - (C.d60+C.d67), > Diff = (T.d67-C.d67) - (T.d60-C.d60), > levels=design > ) > > Which gives the following contrast matrix: > > Contrasts > Levels Treatment_d60 Treatment_d67 Treatment Diff > C.d60 -1 0 -1 1 > T.d60 1 0 1 -1 > C.d67 0 -1 -1 -1 > T.d67 0 1 1 1 > > I've now realised that there is contamination in some of my samples > and > would like to include this in the model so I can try to remove it's > effect and try to increase the power to detect differentially > expressed > genes for the contrasts show above. However I'm not clear on how to > proceed. I don't think the above approach can be extended to include > the > contamination effect. So I though something like this might work/be > correct: > > Treatment<-factor(targets$Treatment, levels=unique(targets$Treatment)) > Gestation<-factor(targets$Gestation, levels=unique(targets$Gestation)) > Contamination<-factor(targets$Contamination, > levels=unique(targets$Contamination)) > design<-model.matrix(~0+Treatment*Gestation+Contamination) > > Which give the following design matrix: > TreatmentC TreatmentT Gestationd67 Contamination1 > TreatmentT:Gestationd67 > 1 1 0 0 0 > 0 > 2 1 0 0 1 > 0 > 3 1 0 0 0 > 0 > 4 1 0 0 1 > 0 > 5 0 1 0 0 > 0 > 6 0 1 0 1 > 0 > 7 0 1 0 1 > 0 > 8 0 1 0 1 > 0 > 9 0 1 0 0 > 0 > 10 1 0 1 0 > 0 > 11 1 0 1 0 > 0 > 12 1 0 1 0 > 0 > 13 1 0 1 1 > 0 > 14 0 1 1 0 > 1 > 15 0 1 1 0 > 1 > 16 0 1 1 0 > 1 > attr(,"assign") > [1] 1 1 2 3 4 > attr(,"contrasts") > attr(,"contrasts")$Treatment > [1] "contr.treatment" > > attr(,"contrasts")$Gestation > [1] "contr.treatment" > > attr(,"contrasts")$Contamination > [1] "contr.treatment" > > > I now have a few questions: > 1) When specifying the model, what's the difference between using > ~Treatment*Gestation and ~0+Treatment*Gestation and how does the alter > the interpretation of the design matrix? > 2) Using the above design matrix (assuming it is correct/the best > way to > go forward), how would I get the Treatment effect, Gestation effect, > interaction, as well as the treatment effect for d60 Gestation and the > treatment effect for d67? > 3) Out of interest, how would I find out the genes affected by the > contamination? > > I'm somewhat confused, so if I'm making any serious errors, please let > me know! > Nathan > > ------------------------------------------------------------- > Dr. Nathan S. Watson-Haigh (publish under Haigh, N.S.) > OCE Post Doctoral Fellow > CSIRO Livestock Industries > J M Rendel Laboratory > Rockhampton > QLD 4701 Tel: +61 (0)7 4923 8121 > Australia Fax: +61 (0)7 4923 8222 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
If I do as you say, I have three factors with two levels each: Treatment - "C" and "T" Gestation - "d60" and "d67" Contamination - 0 and 1 I set up the design matrix as follows: TG<-paste(pData(eSet)$Treatment, pData(eSet)$Gestation, pData(eSet)$Muscle_Contamination, sep=".") TG<-factor(TG,levels=unique(TG)) design<-model.matrix(~0+TG) colnames(design)<-levels(TG) Which gives the following design matrix: C.d60.0 C.d60.1 T.d60.0 T.d60.1 C.d67.0 C.d67.1 T.d67.0 1 1 0 0 0 0 0 0 2 0 1 0 0 0 0 0 3 1 0 0 0 0 0 0 4 0 1 0 0 0 0 0 5 0 0 1 0 0 0 0 6 0 0 0 1 0 0 0 7 0 0 0 1 0 0 0 8 0 0 0 1 0 0 0 9 0 0 1 0 0 0 0 10 0 0 0 0 1 0 0 11 0 0 0 0 1 0 0 12 0 0 0 0 1 0 0 13 0 0 0 0 0 1 0 14 0 0 0 0 0 0 1 15 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1 attr(,"assign") [1] 1 1 1 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$TG [1] "contr.treatment" For a contrast of C.d60 vs T.d60, I then need to group together C.d60.0 and C.d60.1 don't I? Doing this I have a contrast marix like this: cont.matrix<-makeContrasts( Treatment_d60 = (T.d60.0+T.d60.1) - (C.d60.0+C.d60.1), Treatment_d67 = (T.d67.0) - (C.d67.0+C.d67.1), Treatment = (T.d60.0+T.d60.1+T.d67.0) - (C.d60.0+C.d60.1+C.d67.0+C.d67.1), Diff = ((T.d67.0) - (C.d67.0+C.d67.1)) - ((T.d60.0+T.d60.1) - (C.d60.0+C.d60.1)), Contamination = (T.d60.1+C.d60.1+C.d67.1) - (T.d60.0+C.d60.0+T.d67.0+C.d67.0), levels=design ) Which is: Contrasts Levels Treatment_d60 Treatment_d67 Treatment Diff Contamination C.d60.0 -1 0 -1 1 -1 C.d60.1 -1 0 -1 1 1 T.d60.0 1 0 1 -1 -1 T.d60.1 1 0 1 -1 1 C.d67.0 0 -1 -1 -1 -1 C.d67.1 0 -1 -1 -1 1 T.d67.0 0 1 1 1 -1 First of all does this look like it should be doing what I want i.e. removing the contamination effect? Or do in order to get the Treatment_d60 effect, do I need to "remove" the contamination effect explicitly? E.g. a contast like this: Treatment_d60 = (T.d60.0+T.d60.1) - (C.d60.0+C.d60.1) - ((T.d60.1+C.d60.1+C.d67.1) - (T.d60.0+C.d60.0+T.d67.0+C.d67.0)) Cheers, Nathan -----Original Message----- From: Mark Cowley [mailto:m.cowley0@gmail.com] Sent: Monday, 21 July 2008 8:55 AM To: Watson-Haigh, Nathan (LI, Rock. Rendel) Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] Experimental Design, Model Matrix and Contrasts Hi Nathan, How about adding the contamination factor to your original design, do the first lmFit, then then fit exactly the same contrasts as you have specified. The effect of contamination will be taken care of in the linear model, then you are left with 'cleaner' estimates of the C/T d60/67 parameters, which your contrasts can pick out. cheers, Mark On 17/07/2008, at 9:59 AM, <nathan.watson-haigh at="" csiro.au=""> <nathan.watson-haigh at="" csiro.au=""> wrote: > I'm getting myself confused with setting up these things for a limma > analysis. Please stop me and correct me if I'm wrong with any of this! > > I have two factors each with two levels: > Treatment - "C" and "T" > Gestation - "d60" and "d67" > > I setup up my model and design matrix as follows: > > TG<-paste(pData(eSet)$Treatment, pData(eSet)$Gestation, sep=".") > TG<-factor(TG,levels=unique(TG)) > design<-model.matrix(~0+TG) > colnames(design)<-levels(TG) > > which gave the following design matrix: > > C.d60 T.d60 C.d67 T.d67 > 1 1 0 0 0 > 2 1 0 0 0 > 3 1 0 0 0 > 4 1 0 0 0 > 5 0 1 0 0 > 6 0 1 0 0 > 7 0 1 0 0 > 8 0 1 0 0 > 9 0 1 0 0 > 10 0 0 1 0 > 11 0 0 1 0 > 12 0 0 1 0 > 13 0 0 1 0 > 14 0 0 0 1 > 15 0 0 0 1 > 16 0 0 0 1 > attr(,"assign") > [1] 1 1 1 1 > attr(,"contrasts") > attr(,"contrasts")$TG > [1] "contr.treatment" > > So I can use contrasts to pull out comparisons of Control Day 60 > (C.d60) > versus Treatment Day 60 (T.d60), Control Day 67 (C.d67) versus > Treatment > Day 67 (T.d67), the treatment main effect as well as the interaction > terms using the following: > > cont.matrix<-makeContrasts( > Treatment_d60 = T.d60 - C.d60, > Treatment_d67 = T.d67 - C.d67, > Treatment = (T.d60+T.d67) - (C.d60+C.d67), > Diff = (T.d67-C.d67) - (T.d60-C.d60), > levels=design > ) > > Which gives the following contrast matrix: > > Contrasts > Levels Treatment_d60 Treatment_d67 Treatment Diff > C.d60 -1 0 -1 1 > T.d60 1 0 1 -1 > C.d67 0 -1 -1 -1 > T.d67 0 1 1 1 > > I've now realised that there is contamination in some of my samples > and > would like to include this in the model so I can try to remove it's > effect and try to increase the power to detect differentially > expressed > genes for the contrasts show above. However I'm not clear on how to > proceed. I don't think the above approach can be extended to include > the > contamination effect. So I though something like this might work/be > correct: > > Treatment<-factor(targets$Treatment, levels=unique(targets$Treatment)) > Gestation<-factor(targets$Gestation, levels=unique(targets$Gestation)) > Contamination<-factor(targets$Contamination, > levels=unique(targets$Contamination)) > design<-model.matrix(~0+Treatment*Gestation+Contamination) > > Which give the following design matrix: > TreatmentC TreatmentT Gestationd67 Contamination1 > TreatmentT:Gestationd67 > 1 1 0 0 0 > 0 > 2 1 0 0 1 > 0 > 3 1 0 0 0 > 0 > 4 1 0 0 1 > 0 > 5 0 1 0 0 > 0 > 6 0 1 0 1 > 0 > 7 0 1 0 1 > 0 > 8 0 1 0 1 > 0 > 9 0 1 0 0 > 0 > 10 1 0 1 0 > 0 > 11 1 0 1 0 > 0 > 12 1 0 1 0 > 0 > 13 1 0 1 1 > 0 > 14 0 1 1 0 > 1 > 15 0 1 1 0 > 1 > 16 0 1 1 0 > 1 > attr(,"assign") > [1] 1 1 2 3 4 > attr(,"contrasts") > attr(,"contrasts")$Treatment > [1] "contr.treatment" > > attr(,"contrasts")$Gestation > [1] "contr.treatment" > > attr(,"contrasts")$Contamination > [1] "contr.treatment" > > > I now have a few questions: > 1) When specifying the model, what's the difference between using > ~Treatment*Gestation and ~0+Treatment*Gestation and how does the alter > the interpretation of the design matrix? > 2) Using the above design matrix (assuming it is correct/the best > way to > go forward), how would I get the Treatment effect, Gestation effect, > interaction, as well as the treatment effect for d60 Gestation and the > treatment effect for d67? > 3) Out of interest, how would I find out the genes affected by the > contamination? > > I'm somewhat confused, so if I'm making any serious errors, please let > me know! > Nathan > > ------------------------------------------------------------- > Dr. Nathan S. Watson-Haigh (publish under Haigh, N.S.) > OCE Post Doctoral Fellow > CSIRO Livestock Industries > J M Rendel Laboratory > Rockhampton > QLD 4701 Tel: +61 (0)7 4923 8121 > Australia Fax: +61 (0)7 4923 8222 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor