Contrasts for 3x2 factorial experiment in R/edgeR
1
0
Entering edit mode
@gordon-smyth
Last seen 58 minutes ago
WEHI, Melbourne, Australia
Dear Ben, glmLRT() allows you to specify the contrast to be tested in two ways. If the contrast happens to correspond to a column of your design matrix, then it is one the original coefficients and you can just specify which one using the coef= argument. Otherwise you can use the contrast= argument to specify any contrast, using a numerical vector of the same length as the number of coefficients. Suppose you used the default treatment parametrization: contrasts(Strain) <- contr.treat(3) contrasts(Treatment) <- contr.treat(2) and fitted a model with Treatment nested within Strain: design <- model.matrix(~Strain+Strain:Treatment) Then the coefficients 4:6 would be the logFC after stimulation in strains A to C respectively. Hence you could test for differential expression after stimulation in strain B by: glmLRT(dge,fit,coef=5) and so on. You could equally well specify the same test by glmLRT(dge,fit,contrast=c(0,0,0,0,1,0,0)) To test for genes responding differently to simulation in strain B compared to strain A, you would use contrast=c(0,0,0,-1,1,0) And so on. Figuring out the appropriate contrast vectors using the sum-to-zero parametrization specified by contr.sum() is more tedious, and I will leave that to you. Best wishes Gordon > Date: Tue, 20 Dec 2011 16:44:42 -0500 > From: Benjamin King <bking at="" mdibl.org=""> > To: bioconductor at r-project.org > Subject: [BioC] Contrasts for 3x2 factorial experiment in R/edgeR > > Hello, > > I'm using R/edgeR to analyze a 3x2 factorial RNA-Seq experiment and I would very much appreciate your help in specifying some contrasts. > > The design of my experiment has two factors, strain and treatment. There are three strains (A, B and C) and two treatments (Unstimulated and Stimulated). I have four biological replicates (except for one sample group where there are only three). > > # Group Strain Treatment > # A.Un A Un > # B.Un B Un > # C.Un C Un > # A.Stim A Stim > # B.Stim B Stim > # C.Stim C Stim > > Using "model.matrix(~Strain*Treatment)" I believe the model coefficients using sum to zero parameterization are as follows. > > Intercept: (A.Un + B.Un + C.Un + A.Stim + B.Stim + C.Stim)/6 > Strain1: (-A.Un + B.Un - A.Stim + B.Stim)/4 > Strain2: (-A.Un + C.Un - A.Stim + C.Stim)/4 > Treatment1: (-A.Un - B.Un - C.Un + A.Stim + B.Stim + C.Stim)/6 > Strain1:Treatment1: (A.Un - B.Un - A.Stim + B.Stim)/4 > Strain2:Treatment2: (A.Un - C.Un - A.Stim + C.Stim)/4 > > I'm interested in these questions: > > 1. Which genes are differentially expressed in Strain B compared to Strain A? > 2. Which genes are differentially expressed in Strain C compared to Strain A? > 3. Which genes respond to Stimulated treatment overall? > 4. Which genes respond to Stimulated treatment in Strain B? > 5. Which genes respond to Stimulated treatment in Strain C? > 6. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A? > 7. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A? > > I believe questions 1, 2, 3, 6 and 7 are model coefficients. If so, how can these contrasts be calculated using the "coef" parameter in the glmLRT function? > > For questions 4 and 5, what is the correct syntax to define the contrast matrix using sum to zero parameterization that I would then pass to the glmLRT function? > > Below is my current script and session information. > > Thank you in advance for your help, > > - Ben > > > > library(edgeR) > > # 3x2 Factorial Design > # Strain Treatment > # A Un > # B Un > # C Un > # A Stim > # B Stim > # C Stim > > ## Specify factors > Strain <- factor(c("A","A","A","B","B","B","B","C","C","C","C","A"," A","A","A","B","B","B","B","C","C","C","C")) > Treatment <- factor(c("Un","Un","Un","Un","Un","Un","Un","Un","Un"," Un","Un","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim ","Stim","Stim","Stim")) > > ## Read in count data > raw.data <- read.table("group_counts.txt",sep="\t",header=T) > d <- raw.data[, 2:24] > rownames(d) <- raw.data[, 1] > > # make DGE object > dge <- DGEList(counts = d, group = c("A.Un","A.Un","A.Un","B.Un","B. Un","B.Un","B.Un","C.Un","C.Un","C.Un","C.Un","A.Stim","A.Stim","A.Sti m","A.Stim","B.Stim","B.Stim","B.Stim","B.Stim","C.Stim","C.Stim","C.S tim","C.Stim")) > dge <- calcNormFactors(dge) > > # filter uninformative genes > m <- 1e6 * t(t(dge$counts) / dge$samples$lib.size) > ridx <- rowSums(m > 1) >= 2 > dge <- dge[ridx,] > > ## Design matrix > # treatment-contrast parameterization > contrasts(Strain) <- contr.sum(3) > contrasts(Treatment) <- contr.sum(2) > design <- model.matrix(~Strain*Treatment) > > ## Estimate Dispersion > dge <- estimateGLMCommonDisp(dge, design) > > ## Fit model with Common Dispersion > fit <- glmFit(dge, design, dispersion=dge$common.dispersion) > > > > >> sessionInfo() > R version 2.13.1 (2011-07-08) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_2.2.5 > > loaded via a namespace (and not attached): > [1] limma_3.8.3 tools_2.13.1 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
• 2.0k views
ADD COMMENT
0
Entering edit mode
@benjamin-king-4567
Last seen 9.6 years ago
Dear Gordon, I very much appreciate your reply and I will follow your suggestion of not using the sum-to-zero parameterization. I do have one additional question about interpreting the model coefficients. If my design matrix is as shown below and the model is "~Strain+Strain:Treatment" is the correct interpretation of the coefficients? Coefficient Interpretation Intercept Strain1 "Strain B compared to Strain A (i.e., Which genes are differentially expressed in Strain B compared to Strain A?)" Strain2 "Strain C compared to Strain A (i.e., Which genes are differentially expressed in Strain C compared to Strain A?)" Treatment1 "Overall treatment effect (i.e., Which genes respond to Stimulated treatment overall?)" Strain1:Treatment1 "Interaction between StrainB and Treatment (i.e., Which genes respond to Stimulated treatment differently in Strain B compared to Strain A?)" Strain2:Treatment2 "Interaction between StrainC and Treatment (i.e., Which genes respond to Stimulated treatment differently in Strain C compared to Strain A?)" (Intercept) Strain1 Strain2 Treatment1 Strain1:Treatment1 Strain2:Treatment1 A.Un 1 -1 -1 -1 1 1 A.Un 1 -1 -1 -1 1 1 A.Un 1 -1 -1 -1 1 1 B.Un 1 1 0 -1 -1 0 B.Un 1 1 0 -1 -1 0 B.Un 1 1 0 -1 -1 0 B.Un 1 1 0 -1 -1 0 C.Un 1 0 0 -1 0 -1 C.Un 1 0 0 -1 0 -1 C.Un 1 0 0 -1 0 -1 C.Un 1 0 0 -1 0 -1 A.Stim 1 -1 -1 1 -1 -1 A.Stim 1 -1 -1 1 -1 -1 A.Stim 1 -1 -1 1 -1 -1 A.Stim 1 -1 -1 1 -1 -1 B.Stim 1 1 0 1 1 0 B.Stim 1 1 0 1 1 0 B.Stim 1 1 0 1 1 0 B.Stim 1 1 0 1 1 0 C.Stim 1 0 1 1 0 1 C.Stim 1 0 1 1 0 1 C.Stim 1 0 1 1 0 1 C.Stim 1 0 1 1 0 1 If this is correct, then is the interaction between strain B and treatment the fifth coefficient or c(0,0,0,0,1,0)? Likewise, is the interaction between strain C and treatment the sixth coefficient or c(0,0,0,0,0,1)? Thank you, - Ben On Jan 2, 2012, at 6:16 PM, Gordon K Smyth wrote: > Dear Ben, > > glmLRT() allows you to specify the contrast to be tested in two ways. If the contrast happens to correspond to a column of your design matrix, then it is one the original coefficients and you can just specify which one using the coef= argument. Otherwise you can use the contrast= argument to specify any contrast, using a numerical vector of the same length as the number of coefficients. > > Suppose you used the default treatment parametrization: > > contrasts(Strain) <- contr.treat(3) > contrasts(Treatment) <- contr.treat(2) > > and fitted a model with Treatment nested within Strain: > > design <- model.matrix(~Strain+Strain:Treatment) > > Then the coefficients 4:6 would be the logFC after stimulation in strains A to C respectively. Hence you could test for differential expression after stimulation in strain B by: > > glmLRT(dge,fit,coef=5) > > and so on. You could equally well specify the same test by > > glmLRT(dge,fit,contrast=c(0,0,0,0,1,0,0)) > > To test for genes responding differently to simulation in strain B compared to strain A, you would use > > contrast=c(0,0,0,-1,1,0) > > And so on. > > Figuring out the appropriate contrast vectors using the sum-to-zero parametrization specified by contr.sum() is more tedious, and I will leave that to you. > > Best wishes > Gordon > >> Date: Tue, 20 Dec 2011 16:44:42 -0500 >> From: Benjamin King <bking@mdibl.org> >> To: bioconductor@r-project.org >> Subject: [BioC] Contrasts for 3x2 factorial experiment in R/edgeR >> >> Hello, >> >> I'm using R/edgeR to analyze a 3x2 factorial RNA-Seq experiment and I would very much appreciate your help in specifying some contrasts. >> >> The design of my experiment has two factors, strain and treatment. There are three strains (A, B and C) and two treatments (Unstimulated and Stimulated). I have four biological replicates (except for one sample group where there are only three). >> >> # Group Strain Treatment >> # A.Un A Un >> # B.Un B Un >> # C.Un C Un >> # A.Stim A Stim >> # B.Stim B Stim >> # C.Stim C Stim >> >> Using "model.matrix(~Strain*Treatment)" I believe the model coefficients using sum to zero parameterization are as follows. >> >> Intercept: (A.Un + B.Un + C.Un + A.Stim + B.Stim + C.Stim)/6 >> Strain1: (-A.Un + B.Un - A.Stim + B.Stim)/4 >> Strain2: (-A.Un + C.Un - A.Stim + C.Stim)/4 >> Treatment1: (-A.Un - B.Un - C.Un + A.Stim + B.Stim + C.Stim)/6 >> Strain1:Treatment1: (A.Un - B.Un - A.Stim + B.Stim)/4 >> Strain2:Treatment2: (A.Un - C.Un - A.Stim + C.Stim)/4 >> >> I'm interested in these questions: >> >> 1. Which genes are differentially expressed in Strain B compared to Strain A? >> 2. Which genes are differentially expressed in Strain C compared to Strain A? >> 3. Which genes respond to Stimulated treatment overall? >> 4. Which genes respond to Stimulated treatment in Strain B? >> 5. Which genes respond to Stimulated treatment in Strain C? >> 6. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A? >> 7. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A? >> >> I believe questions 1, 2, 3, 6 and 7 are model coefficients. If so, how can these contrasts be calculated using the "coef" parameter in the glmLRT function? >> >> For questions 4 and 5, what is the correct syntax to define the contrast matrix using sum to zero parameterization that I would then pass to the glmLRT function? >> >> Below is my current script and session information. >> >> Thank you in advance for your help, >> >> - Ben >> >> >> >> library(edgeR) >> >> # 3x2 Factorial Design >> # Strain Treatment >> # A Un >> # B Un >> # C Un >> # A Stim >> # B Stim >> # C Stim >> >> ## Specify factors >> Strain <- factor(c("A","A","A","B","B","B","B","C","C","C","C","A", "A","A","A","B","B","B","B","C","C","C","C")) >> Treatment <- factor(c("Un","Un","Un","Un","Un","Un","Un","Un","Un", "Un","Un","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Sti m","Stim","Stim","Stim")) >> >> ## Read in count data >> raw.data <- read.table("group_counts.txt",sep="\t",header=T) >> d <- raw.data[, 2:24] >> rownames(d) <- raw.data[, 1] >> >> # make DGE object >> dge <- DGEList(counts = d, group = c("A.Un","A.Un","A.Un","B.Un","B .Un","B.Un","B.Un","C.Un","C.Un","C.Un","C.Un","A.Stim","A.Stim","A.St im","A.Stim","B.Stim","B.Stim","B.Stim","B.Stim","C.Stim","C.Stim","C. Stim","C.Stim")) >> dge <- calcNormFactors(dge) >> >> # filter uninformative genes >> m <- 1e6 * t(t(dge$counts) / dge$samples$lib.size) >> ridx <- rowSums(m > 1) >= 2 >> dge <- dge[ridx,] >> >> ## Design matrix >> # treatment-contrast parameterization >> contrasts(Strain) <- contr.sum(3) >> contrasts(Treatment) <- contr.sum(2) >> design <- model.matrix(~Strain*Treatment) >> >> ## Estimate Dispersion >> dge <- estimateGLMCommonDisp(dge, design) >> >> ## Fit model with Common Dispersion >> fit <- glmFit(dge, design, dispersion=dge$common.dispersion) >> >> >> >> >>> sessionInfo() >> R version 2.13.1 (2011-07-08) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] splines stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] edgeR_2.2.5 >> >> loaded via a namespace (and not attached): >> [1] limma_3.8.3 tools_2.13.1 > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Dear Ben, Your design matrix is still using the sum-to-zero parametrization, which I find a bit unhelpful in the genomic context. As I said in my previous reply, I will leave interpretations of that parametrization to you. Best wishes Gordon On Tue, 3 Jan 2012, Benjamin King wrote: > Dear Gordon, > > I very much appreciate your reply and I will follow your suggestion of > not using the sum-to-zero parameterization. > > I do have one additional question about interpreting the model > coefficients. If my design matrix is as shown below and the model is > "~Strain+Strain:Treatment" is the correct interpretation of the > coefficients? > > Coefficient Interpretation > Intercept > Strain1 "Strain B compared to Strain A (i.e., Which genes are differentially expressed in Strain B compared to Strain A?)" > Strain2 "Strain C compared to Strain A (i.e., Which genes are differentially expressed in Strain C compared to Strain A?)" > Treatment1 "Overall treatment effect (i.e., Which genes respond to Stimulated treatment overall?)" > Strain1:Treatment1 "Interaction between StrainB and Treatment (i.e., Which genes respond to Stimulated treatment differently in Strain B compared to Strain A?)" > Strain2:Treatment2 "Interaction between StrainC and Treatment (i.e., Which genes respond to Stimulated treatment differently in Strain C compared to Strain A?)" > > (Intercept) Strain1 Strain2 Treatment1 Strain1:Treatment1 Strain2:Treatment1 > A.Un 1 -1 -1 -1 1 1 > A.Un 1 -1 -1 -1 1 1 > A.Un 1 -1 -1 -1 1 1 > B.Un 1 1 0 -1 -1 0 > B.Un 1 1 0 -1 -1 0 > B.Un 1 1 0 -1 -1 0 > B.Un 1 1 0 -1 -1 0 > C.Un 1 0 0 -1 0 -1 > C.Un 1 0 0 -1 0 -1 > C.Un 1 0 0 -1 0 -1 > C.Un 1 0 0 -1 0 -1 > A.Stim 1 -1 -1 1 -1 -1 > A.Stim 1 -1 -1 1 -1 -1 > A.Stim 1 -1 -1 1 -1 -1 > A.Stim 1 -1 -1 1 -1 -1 > B.Stim 1 1 0 1 1 0 > B.Stim 1 1 0 1 1 0 > B.Stim 1 1 0 1 1 0 > B.Stim 1 1 0 1 1 0 > C.Stim 1 0 1 1 0 1 > C.Stim 1 0 1 1 0 1 > C.Stim 1 0 1 1 0 1 > C.Stim 1 0 1 1 0 1 > > If this is correct, then is the interaction between strain B and treatment the fifth coefficient or c(0,0,0,0,1,0)? > > Likewise, is the interaction between strain C and treatment the sixth coefficient or c(0,0,0,0,0,1)? > > Thank you, > > - Ben > > > On Jan 2, 2012, at 6:16 PM, Gordon K Smyth wrote: > >> Dear Ben, >> >> glmLRT() allows you to specify the contrast to be tested in two ways. If the contrast happens to correspond to a column of your design matrix, then it is one the original coefficients and you can just specify which one using the coef= argument. Otherwise you can use the contrast= argument to specify any contrast, using a numerical vector of the same length as the number of coefficients. >> >> Suppose you used the default treatment parametrization: >> >> contrasts(Strain) <- contr.treat(3) >> contrasts(Treatment) <- contr.treat(2) >> >> and fitted a model with Treatment nested within Strain: >> >> design <- model.matrix(~Strain+Strain:Treatment) >> >> Then the coefficients 4:6 would be the logFC after stimulation in strains A to C respectively. Hence you could test for differential expression after stimulation in strain B by: >> >> glmLRT(dge,fit,coef=5) >> >> and so on. You could equally well specify the same test by >> >> glmLRT(dge,fit,contrast=c(0,0,0,0,1,0,0)) >> >> To test for genes responding differently to simulation in strain B compared to strain A, you would use >> >> contrast=c(0,0,0,-1,1,0) >> >> And so on. >> >> Figuring out the appropriate contrast vectors using the sum-to-zero >> parametrization specified by contr.sum() is more tedious, and I will >> leave that to you. >> >> Best wishes >> Gordon >> >>> Date: Tue, 20 Dec 2011 16:44:42 -0500 >>> From: Benjamin King <bking at="" mdibl.org=""> >>> To: bioconductor at r-project.org >>> Subject: [BioC] Contrasts for 3x2 factorial experiment in R/edgeR >>> >>> Hello, >>> >>> I'm using R/edgeR to analyze a 3x2 factorial RNA-Seq experiment and I would very much appreciate your help in specifying some contrasts. >>> >>> The design of my experiment has two factors, strain and treatment. There are three strains (A, B and C) and two treatments (Unstimulated and Stimulated). I have four biological replicates (except for one sample group where there are only three). >>> >>> # Group Strain Treatment >>> # A.Un A Un >>> # B.Un B Un >>> # C.Un C Un >>> # A.Stim A Stim >>> # B.Stim B Stim >>> # C.Stim C Stim >>> >>> Using "model.matrix(~Strain*Treatment)" I believe the model coefficients using sum to zero parameterization are as follows. >>> >>> Intercept: (A.Un + B.Un + C.Un + A.Stim + B.Stim + C.Stim)/6 >>> Strain1: (-A.Un + B.Un - A.Stim + B.Stim)/4 >>> Strain2: (-A.Un + C.Un - A.Stim + C.Stim)/4 >>> Treatment1: (-A.Un - B.Un - C.Un + A.Stim + B.Stim + C.Stim)/6 >>> Strain1:Treatment1: (A.Un - B.Un - A.Stim + B.Stim)/4 >>> Strain2:Treatment2: (A.Un - C.Un - A.Stim + C.Stim)/4 >>> >>> I'm interested in these questions: >>> >>> 1. Which genes are differentially expressed in Strain B compared to Strain A? >>> 2. Which genes are differentially expressed in Strain C compared to Strain A? >>> 3. Which genes respond to Stimulated treatment overall? >>> 4. Which genes respond to Stimulated treatment in Strain B? >>> 5. Which genes respond to Stimulated treatment in Strain C? >>> 6. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A? >>> 7. Which genes respond to Stimulated treatment differently in Strain B compared to Strain A? >>> >>> I believe questions 1, 2, 3, 6 and 7 are model coefficients. If so, how can these contrasts be calculated using the "coef" parameter in the glmLRT function? >>> >>> For questions 4 and 5, what is the correct syntax to define the contrast matrix using sum to zero parameterization that I would then pass to the glmLRT function? >>> >>> Below is my current script and session information. >>> >>> Thank you in advance for your help, >>> >>> - Ben >>> >>> >>> >>> library(edgeR) >>> >>> # 3x2 Factorial Design >>> # Strain Treatment >>> # A Un >>> # B Un >>> # C Un >>> # A Stim >>> # B Stim >>> # C Stim >>> >>> ## Specify factors >>> Strain <- factor(c("A","A","A","B","B","B","B","C","C","C","C","A" ,"A","A","A","B","B","B","B","C","C","C","C")) >>> Treatment <- factor(c("Un","Un","Un","Un","Un","Un","Un","Un","Un" ,"Un","Un","Stim","Stim","Stim","Stim","Stim","Stim","Stim","Stim","St im","Stim","Stim","Stim")) >>> >>> ## Read in count data >>> raw.data <- read.table("group_counts.txt",sep="\t",header=T) >>> d <- raw.data[, 2:24] >>> rownames(d) <- raw.data[, 1] >>> >>> # make DGE object >>> dge <- DGEList(counts = d, group = c("A.Un","A.Un","A.Un","B.Un"," B.Un","B.Un","B.Un","C.Un","C.Un","C.Un","C.Un","A.Stim","A.Stim","A.S tim","A.Stim","B.Stim","B.Stim","B.Stim","B.Stim","C.Stim","C.Stim","C .Stim","C.Stim")) >>> dge <- calcNormFactors(dge) >>> >>> # filter uninformative genes >>> m <- 1e6 * t(t(dge$counts) / dge$samples$lib.size) >>> ridx <- rowSums(m > 1) >= 2 >>> dge <- dge[ridx,] >>> >>> ## Design matrix >>> # treatment-contrast parameterization >>> contrasts(Strain) <- contr.sum(3) >>> contrasts(Treatment) <- contr.sum(2) >>> design <- model.matrix(~Strain*Treatment) >>> >>> ## Estimate Dispersion >>> dge <- estimateGLMCommonDisp(dge, design) >>> >>> ## Fit model with Common Dispersion >>> fit <- glmFit(dge, design, dispersion=dge$common.dispersion) >>> >>> >>> >>> >>>> sessionInfo() >>> R version 2.13.1 (2011-07-08) >>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] splines stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] edgeR_2.2.5 >>> >>> loaded via a namespace (and not attached): >>> [1] limma_3.8.3 tools_2.13.1 >> > > > > > > > > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLY

Login before adding your answer.

Traffic: 679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6