Limma design and contrast matrix question.
2
2
Entering edit mode
@matthew-thornton-5564
Last seen 4 weeks ago
USA, Los Angeles, USC
Hello, I am analysing microarray data collected with Affymetrix MouseGene 2.0 ST chips. I have a few questions about properly using limma. I have four groups with three replicates. The groups are Control, Treatment #1 & #2, Treatment #1, and Treatment #2. I may not have the proper design matrix. I am not familiar with their use in linear regression. Currently, my design matrix is set up like this: # Design matrix for Limma design <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3,4,4,4))) colnames(design) <- c("Control", "Group1", "Group2", "Group3") > design Control Group1 Group2 Group3 1 1 0 0 0 2 1 0 0 0 3 1 0 0 0 4 0 1 0 0 5 0 1 0 0 6 0 1 0 0 7 0 0 1 0 8 0 0 1 0 9 0 0 1 0 10 0 0 0 1 11 0 0 0 1 12 0 0 0 1 attr(,"assign") [1] 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$factor(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)) [1] "contr.treatment" This was modified directly from the limma users guide page 36. Should the control group be all 1's? and should Group1 (treatment 1 & 2) be 1's from row 7:12? I would like to find genes different from control and I would like to find genes differentially expressed between the combination of treatments versus each treatment alone. My Contrasts matrix is set up like this: # Limma contrast matrix more than 5, no Venn diagrams. contrast.matrix <- makeContrasts(Group1-Control, Group2-Control, Group3-Control, Group3-Group2, Group3-Group1, Group2-Group1, levels=design) > contrast.matrix Contrasts Levels Group1 - Control Group2 - Control Group3 - Control Group3 - Group2 Control -1 -1 -1 0 Group1 1 0 0 0 Group2 0 1 0 -1 Group3 0 0 1 1 Contrasts Levels Group3 - Group1 Group2 - Group1 Control 0 0 Group1 -1 -1 Group2 0 1 Group3 1 0 Is there a better way to relate the fact that Group 2 is a combination of treatment 1 and treatment 2? Thanks! Matt Matthew E. Thornton Research Lab Specialist Saban Research Institute USC/Children?s Hospital Los Angeles 513X, Mail Stop 35 4661 W. Sunset Blvd. Los Angeles, CA 90027-6020 matthew.thornton at med.usc.edu ADD COMMENT 0 Entering edit mode Dario Strbenac ★ 1.5k @dario-strbenac-5916 Last seen 23 days ago Australia Hello, Your design matrix is correct in the group means style. It is not the only correct way to do it, though. How you tested for the interaction effect of the two treatments is wrong. The contrast should be Group1 - Group2 - Group3 + Control. I also don't think your Group1 - Control, Group3 - Group1 and Group2 - Group1 contrasts are meaningful. -------------------------------------- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia ADD COMMENT 0 Entering edit mode Hi Dario! Thank you for your reply. I am glad that I have a proper design matrix. I will try the 'Group1 -Group2 -Group3 + Control' contrasts.matrix. I am not very experienced with general linear models. So let me guess why that would be an appropriate contrast matrix, is it because the null hypothesis would be the mean of Group1 = mean Group2 + mean Group3 - mean Control? So if I am interested in comparing Group2 to Control, I should have included 'Group2 + Control' in the contrasts.matrix I really appreciate your help. Thank you again! Sincerely, Matt Matthew E. Thornton Research Lab Specialist Saban Research Institute USC/Children?s Hospital Los Angeles 513X, Mail Stop 35 4661 W. Sunset Blvd. Los Angeles, CA 90027-6020 matthew.thornton at med.usc.edu ________________________________________ From: Dario Strbenac [dstr7320@uni.sydney.edu.au] Sent: Thursday, January 23, 2014 5:00 PM To: Thornton, Matthew; bioconductor at r-project.org Subject: RE: Limma design and contrast matrix question. Hello, Your design matrix is correct in the group means style. It is not the only correct way to do it, though. How you tested for the interaction effect of the two treatments is wrong. The contrast should be Group1 - Group2 - Group3 + Control. I also don't think your Group1 - Control, Group3 - Group1 and Group2 - Group1 contrasts are meaningful. -------------------------------------- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia ADD REPLY 0 Entering edit mode @james-w-macdonald-5106 Last seen 2 days ago United States Hi Matt, On 1/23/2014 7:13 PM, Thornton, Matthew wrote: > Hello, > > I am analysing microarray data collected with Affymetrix MouseGene 2.0 ST chips. I have a few questions about properly using limma. I have four groups with three replicates. The groups are Control, Treatment #1 & #2, Treatment #1, and Treatment #2. I may not have the proper design matrix. I am not familiar with their use in linear regression. Currently, my design matrix is set up like this: > > # Design matrix for Limma > design <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3,4,4,4))) > colnames(design) <- c("Control", "Group1", "Group2", "Group3") > >> design > Control Group1 Group2 Group3 > 1 1 0 0 0 > 2 1 0 0 0 > 3 1 0 0 0 > 4 0 1 0 0 > 5 0 1 0 0 > 6 0 1 0 0 > 7 0 0 1 0 > 8 0 0 1 0 > 9 0 0 1 0 > 10 0 0 0 1 > 11 0 0 0 1 > 12 0 0 0 1 > attr(,"assign") > [1] 1 1 1 1 > attr(,"contrasts") > attr(,"contrasts")$factor(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)) > [1] "contr.treatment" > > This was modified directly from the limma users guide page 36. Should the control group be all 1's? and should Group1 (treatment 1 & 2) be 1's from row 7:12? I would like to find genes different from control and I would like to find genes differentially expressed between the combination of treatments versus each treatment alone. You could parameterize your model that way, but I would personally keep it the way you have it. This parameterization (a cell means model) simply computes the mean expression for each group, and then you have to make all contrasts explicitly. The model with all 1's in the first column has an implicit contrast (everything is a comparison to control), so you have to make contrasts for some comparisons, but not for others. Either way you will get the same exact results - the only difference is the interpretation of the coefficients. > > My Contrasts matrix is set up like this: > > # Limma contrast matrix more than 5, no Venn diagrams. > contrast.matrix <- makeContrasts(Group1-Control, Group2-Control, Group3-Control, Group3-Group2, Group3-Group1, Group2-Group1, levels=design) > >> contrast.matrix > Contrasts > Levels Group1 - Control Group2 - Control Group3 - Control Group3 - Group2 > Control -1 -1 -1 0 > Group1 1 0 0 0 > Group2 0 1 0 -1 > Group3 0 0 1 1 > Contrasts > Levels Group3 - Group1 Group2 - Group1 > Control 0 0 > Group1 -1 -1 > Group2 0 1 > Group3 1 0 > > Is there a better way to relate the fact that Group 2 is a combination of treatment 1 and treatment 2? Not really. This is the point at which I start the experimental design grilling session. Why did you do the combination treatment? What did you expect to see (e.g., what is your hypothesis that you are testing)? Do you expect synergistic effects? If so, by how much? You could use treat() to see if you reliably get a particular fold change increase in group 2 versus both group 3 and group 4 (or the mean expression of those two treatments). But how you analyze that combination treatment is dependent on the hypothesis you are testing. Best, Jim > > Thanks! > > Matt > Matthew E. Thornton > > Research Lab Specialist > Saban Research Institute > > USC/Children?s Hospital Los Angeles > 513X, Mail Stop 35 > 4661 W. Sunset Blvd. > Los Angeles, CA 90027-6020 > > matthew.thornton at med.usc.edu > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
0
Entering edit mode
Hi James, Thank you for your reply! > Not really. This is the point at which I start the experimental design grilling session. Why did you do the combination treatment? What did you expect to see (e.g., what is your hypothesis that you are testing)? Yes, we are expecting synergy between the treatments. I am simply processing the data, I didn't isolate the cells or process the samples. I have intentionally kept vague as to the exact details of the experiment, to prevent inadvertent bias. I know that there are two treatments, a combination of the two treatments and a control. I also know which group is which. I am looking for differences between each treatment versus control, the differences between combination and each individual treatment. There is a dramatic effect seen in vivo from the combination, but not so much from either treatment separately, so they do expect larger changes in gene expression for the combination than each individual treatment. > Do you expect synergistic effects? If so, by how much? You could use treat() to see if you reliably get a particular fold change increase in group 2 versus both group 3 and group 4 (or the mean expression of those two treatments). But how you analyze that combination treatment is dependent on the hypothesis you are testing. I looked in the Limma Users guide for more information on the treat() function and there isn't very much info, or more importantly for me, an example. I would have liked to use the treat() function with a lfc, based on the observed log fold change from the ERCC controls. Is there a better resource for the treat() function? Thank you again! Sincerely, Matt Matthew E. Thornton Research Lab Specialist Saban Research Institute USC/Children?s Hospital Los Angeles 513X, Mail Stop 35 4661 W. Sunset Blvd. Los Angeles, CA 90027-6020 matthew.thornton at med.usc.edu ________________________________________ From: James W. MacDonald [jmacdon@uw.edu] Sent: Friday, January 24, 2014 7:07 AM To: Thornton, Matthew Cc: bioconductor at r-project.org Subject: Re: [BioC] Limma design and contrast matrix question. Hi Matt, On 1/23/2014 7:13 PM, Thornton, Matthew wrote: > Hello, > > I am analysing microarray data collected with Affymetrix MouseGene 2.0 ST chips. I have a few questions about properly using limma. I have four groups with three replicates. The groups are Control, Treatment #1 & #2, Treatment #1, and Treatment #2. I may not have the proper design matrix. I am not familiar with their use in linear regression. Currently, my design matrix is set up like this: > > # Design matrix for Limma > design <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3,4,4,4))) > colnames(design) <- c("Control", "Group1", "Group2", "Group3") > >> design > Control Group1 Group2 Group3 > 1 1 0 0 0 > 2 1 0 0 0 > 3 1 0 0 0 > 4 0 1 0 0 > 5 0 1 0 0 > 6 0 1 0 0 > 7 0 0 1 0 > 8 0 0 1 0 > 9 0 0 1 0 > 10 0 0 0 1 > 11 0 0 0 1 > 12 0 0 0 1 > attr(,"assign") > [1] 1 1 1 1 > attr(,"contrasts") > attr(,"contrasts")$factor(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)) > [1] "contr.treatment" > > This was modified directly from the limma users guide page 36. Should the control group be all 1's? and should Group1 (treatment 1 & 2) be 1's from row 7:12? I would like to find genes different from control and I would like to find genes differentially expressed between the combination of treatments versus each treatment alone. You could parameterize your model that way, but I would personally keep it the way you have it. This parameterization (a cell means model) simply computes the mean expression for each group, and then you have to make all contrasts explicitly. The model with all 1's in the first column has an implicit contrast (everything is a comparison to control), so you have to make contrasts for some comparisons, but not for others. Either way you will get the same exact results - the only difference is the interpretation of the coefficients. > > My Contrasts matrix is set up like this: > > # Limma contrast matrix more than 5, no Venn diagrams. > contrast.matrix <- makeContrasts(Group1-Control, Group2-Control, Group3-Control, Group3-Group2, Group3-Group1, Group2-Group1, levels=design) > >> contrast.matrix > Contrasts > Levels Group1 - Control Group2 - Control Group3 - Control Group3 - Group2 > Control -1 -1 -1 0 > Group1 1 0 0 0 > Group2 0 1 0 -1 > Group3 0 0 1 1 > Contrasts > Levels Group3 - Group1 Group2 - Group1 > Control 0 0 > Group1 -1 -1 > Group2 0 1 > Group3 1 0 > > Is there a better way to relate the fact that Group 2 is a combination of treatment 1 and treatment 2? Not really. This is the point at which I start the experimental design grilling session. Why did you do the combination treatment? What did you expect to see (e.g., what is your hypothesis that you are testing)? Do you expect synergistic effects? If so, by how much? You could use treat() to see if you reliably get a particular fold change increase in group 2 versus both group 3 and group 4 (or the mean expression of those two treatments). But how you analyze that combination treatment is dependent on the hypothesis you are testing. Best, Jim > > Thanks! > > Matt > Matthew E. Thornton > > Research Lab Specialist > Saban Research Institute > > USC/Children?s Hospital Los Angeles > 513X, Mail Stop 35 > 4661 W. Sunset Blvd. > Los Angeles, CA 90027-6020 > > matthew.thornton at med.usc.edu > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 ADD REPLY 0 Entering edit mode Hi Matt, Yeah, I don't know if there is much about treat in the limma User's Guide or not. There is some info in the eBayes help page (?treat will get you there), and there is a paper (http://bioinformatics.oxfordjournals.org/content/25/6/765.short). Usage is pretty straightforward. You just use treat() and topTreat() in lieu of eBayes() and topTable(). The basic idea is that the alternate hypothesis you are testing is that the absolute difference between the mean of two groups is greater than some quantity (rather than the conventional alternate hypothesis, which is that the absolute difference is greater than zero). Note that the addition of a threshold to the alternative hypothesis tends to make it a pretty big hurdle, so I wouldn't go crazy with the lfc argument. In other words, it seems pretty common for people to add a post hoc 1.5 or 2-fold change requirement (e.g., topTable(fit2, lfc = 1)). That tends to be a much less conservative test than using treat with an lfc = 1 argument, so you will likely be better served by using the smallest log fold change that the biologists think is plausibly linked to a phenotypic difference. Best, Jim On 1/24/2014 12:55 PM, Thornton, Matthew wrote: > Hi James, > > Thank you for your reply! > >> Not really. This is the point at which I start the experimental design > grilling session. Why did you do the combination treatment? What did you > expect to see (e.g., what is your hypothesis that you are testing)? > > Yes, we are expecting synergy between the treatments. I am simply processing the data, I didn't isolate the cells or process the samples. I have intentionally kept vague as to the exact details of the experiment, to prevent inadvertent bias. I know that there are two treatments, a combination of the two treatments and a control. I also know which group is which. I am looking for differences between each treatment versus control, the differences between combination and each individual treatment. There is a dramatic effect seen in vivo from the combination, but not so much from either treatment separately, so they do expect larger changes in gene expression for the combination than each individual treatment. > >> Do you expect synergistic effects? If so, by how much? You could use > treat() to see if you reliably get a particular fold change increase in > group 2 versus both group 3 and group 4 (or the mean expression of those > two treatments). But how you analyze that combination treatment is > dependent on the hypothesis you are testing. > > I looked in the Limma Users guide for more information on the treat() function and there isn't very much info, or more importantly for me, an example. I would have liked to use the treat() function with a lfc, based on the observed log fold change from the ERCC controls. Is there a better resource for the treat() function? > > Thank you again! > > Sincerely, > > Matt > > > > Matthew E. Thornton > > Research Lab Specialist > Saban Research Institute > > USC/Children?s Hospital Los Angeles > 513X, Mail Stop 35 > 4661 W. Sunset Blvd. > Los Angeles, CA 90027-6020 > > matthew.thornton at med.usc.edu > > ________________________________________ > From: James W. MacDonald [jmacdon at uw.edu] > Sent: Friday, January 24, 2014 7:07 AM > To: Thornton, Matthew > Cc: bioconductor at r-project.org > Subject: Re: [BioC] Limma design and contrast matrix question. > > Hi Matt, > > On 1/23/2014 7:13 PM, Thornton, Matthew wrote: >> Hello, >> >> I am analysing microarray data collected with Affymetrix MouseGene 2.0 ST chips. I have a few questions about properly using limma. I have four groups with three replicates. The groups are Control, Treatment #1 & #2, Treatment #1, and Treatment #2. I may not have the proper design matrix. I am not familiar with their use in linear regression. Currently, my design matrix is set up like this: >> >> # Design matrix for Limma >> design <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3,4,4,4))) >> colnames(design) <- c("Control", "Group1", "Group2", "Group3") >> >>> design >> Control Group1 Group2 Group3 >> 1 1 0 0 0 >> 2 1 0 0 0 >> 3 1 0 0 0 >> 4 0 1 0 0 >> 5 0 1 0 0 >> 6 0 1 0 0 >> 7 0 0 1 0 >> 8 0 0 1 0 >> 9 0 0 1 0 >> 10 0 0 0 1 >> 11 0 0 0 1 >> 12 0 0 0 1 >> attr(,"assign") >> [1] 1 1 1 1 >> attr(,"contrasts") >> attr(,"contrasts")$factor(c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4)) >> [1] "contr.treatment" >> >> This was modified directly from the limma users guide page 36. Should the control group be all 1's? and should Group1 (treatment 1 & 2) be 1's from row 7:12? I would like to find genes different from control and I would like to find genes differentially expressed between the combination of treatments versus each treatment alone. > You could parameterize your model that way, but I would personally keep > it the way you have it. This parameterization (a cell means model) > simply computes the mean expression for each group, and then you have to > make all contrasts explicitly. The model with all 1's in the first > column has an implicit contrast (everything is a comparison to control), > so you have to make contrasts for some comparisons, but not for others. > > Either way you will get the same exact results - the only difference is > the interpretation of the coefficients. > >> My Contrasts matrix is set up like this: >> >> # Limma contrast matrix more than 5, no Venn diagrams. >> contrast.matrix <- makeContrasts(Group1-Control, Group2-Control, Group3-Control, Group3-Group2, Group3-Group1, Group2-Group1, levels=design) >> >>> contrast.matrix >> Contrasts >> Levels Group1 - Control Group2 - Control Group3 - Control Group3 - Group2 >> Control -1 -1 -1 0 >> Group1 1 0 0 0 >> Group2 0 1 0 -1 >> Group3 0 0 1 1 >> Contrasts >> Levels Group3 - Group1 Group2 - Group1 >> Control 0 0 >> Group1 -1 -1 >> Group2 0 1 >> Group3 1 0 >> >> Is there a better way to relate the fact that Group 2 is a combination of treatment 1 and treatment 2? > Not really. This is the point at which I start the experimental design > grilling session. Why did you do the combination treatment? What did you > expect to see (e.g., what is your hypothesis that you are testing)? > > Do you expect synergistic effects? If so, by how much? You could use > treat() to see if you reliably get a particular fold change increase in > group 2 versus both group 3 and group 4 (or the mean expression of those > two treatments). But how you analyze that combination treatment is > dependent on the hypothesis you are testing. > > Best, > > Jim > > > > >> Thanks! >> >> Matt >> Matthew E. Thornton >> >> Research Lab Specialist >> Saban Research Institute >> >> USC/Children?s Hospital Los Angeles >> 513X, Mail Stop 35 >> 4661 W. Sunset Blvd. >> Los Angeles, CA 90027-6020 >> >> matthew.thornton at med.usc.edu >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099