Question

edgeR: Specifying the "coef"-argument in glmLRT in a two factor study

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Dear Henning, Thanks for including reproducible code. I think however that you might be taking comments that were made in the edgeR User's Guide about an additive model and trying to apply them to an interaction model. Let me clarify. Your experiment has three genotypes and one treatment (stress vs control). When we analyse a two factor study, the first question we have to consider is whether it makes sense to assume that the treatment effect will be the same for each genotype. If the answer is yes, then you would fit an additive linear model: design <- model.matrix(~ genotype + treatment) This model has just four coefficients. The 4th coefficient would indeed test for a treatment effect irrespective of genotype. This type of analysis is done in the last two case studies of the edgeR User's Guide: http://bioconductor.org/packages/2.10/bioc/vignettes/edgeR/inst/doc/ed geRUsersGuide.pdf This type of analysis is appropriate when the non-treatment factor is a blocking or nuisance variable that is not of primary scientific interest, for example a batch effect. In your hypothesized study, however, your non-treatment factor is genotype. Usually one would design a study of this type to see whether the treatment effect differs between genotypes. So you need to fit a model that allows the treatment effects to be different between genotypes, for example Model 1: ~ genotype * treatment which is equivalent to ~genotype+treatment+genotype:treatment, or Model 2: ~ genotype + genotype:treatment These models are all equivalent, except for parametrization, and all have six coefficients. With these model you cannot test for "treatment effect irrespective of genotype", because there is no such thing. You are making the assumption that the treatment effect may depends on genotype, hence it makes no scientific sense to talk about a treatment effect without regard to the genotype it applies to. If you want to know which genes have a treatment effect in genotype A, you would fit Model 2 and test for coef=4. For treatment effect in genotype B, test for coef=5, and for treatment effect in genotype C, test for coef=6. If you want to know which genes have treatment effects that differ between the genotypes, then you would test for interaction. You do this by fitting Model 1 and testing for "coef=4:6". Not that this is one test, not three tests. Best wishes Gordon > Date: Wed, 28 Mar 2012 15:11:59 +0200 > From: "Henning Wildhagen" <hwildhagen at="" gmx.de=""> > To: bioconductor at r-project.org > Subject: [BioC] edgeR: Specifying the "coef"-argument in glmLRT in a > two factor study > > Hi all, > > i am analysing a two-factorial RNA-seq experiment using edgeR. The design of my study has two factors, genotype and treatment. > Genotype has three levels (A,B,C), "treatment" has two levels ("control", "stress"). > I have some questions regarding the usage of the "coef" and "contrast"-argument using the glmLRT()-function as outlined below: > > ##################################################################### > BEGIN CODE > library(edgeR) > # Generate a dataframe of neg.binom counts > set.seed(50) > s <- as.data.frame(matrix(rnbinom(3000,mu=20,size=20),ncol=3)) > u <- as.data.frame(matrix(rnbinom(3000,mu=30,size=25),ncol=3)) > v <- as.data.frame(matrix(rnbinom(3000,mu=40,size=20),ncol=3)) > w <- as.data.frame(matrix(rnbinom(3000,mu=50,size=25),ncol=3)) > x <- as.data.frame(matrix(rnbinom(3000,mu=60,size=20),ncol=3)) > y <- as.data.frame(matrix(rnbinom(3000,mu=70,size=25),ncol=3)) > z <- as.data.frame(cbind(s,u,v,w,x,y)) > names(z) <- c("A_Con1", "A_Con2","A_Con3","A_Str1", "A_Str2","A_Str3", > "B_Con1","B_Con2","B_Con3","B_Str1","B_Str2","B_Str3", > "C_Con1","C_Con2","C_Con3","C_Str1","C_Str2","C_Str3") > > # Specify factors > genotype <- as.factor(c(rep("A",6),rep("B",6),rep("C",6))) > treatment <-as.factor(rep(c(rep("Control",3),rep("Stress",3)),3)) > > # DGEList-object > groups <- c("A_Con", "A_Con","A_Con","A_Str", "A_Str","A_Str", > "B_Con","B_Con","B_Con","B_Str","B_Str","B_Str", > "C_Con","C_Con","C_Con","C_Str","C_Str","C_Str") > > tmp1 <- DGEList(counts=z,group=groups) > tmp1 <- calcNormFactors(tmp1) > > # Design matrix > design <- model.matrix(~genotype*treatment) > > # Estimating Cox-Reid tagwise dispersion > tmp2 <- getPriorN(tmp1, design=design,prior.df=20) > tmp1 <- estimateTagwiseDisp(tmp1,prior.n=tmp2,trend="movingave",prop .used=0.3,grid.length=500) > summary(tmp1$tagwise.dispersion) > tmp1$common.disp > > # Calling DE genes with the full model > glmfit.tmp1 <- glmFit(tmp1,design=design,dispersion=tmp1$tagwise.dis persion,method="auto") > > # Testing for the effect of "treatment" > lrt.tmp1.treat <- glmLRT(tmp1,glmfit.tmp1,coef=4) # > treat <- topTags(lrt.tmp1.treat,n=nrow(lrt.tmp1.treat)) > treat1 <- treat[[1]]$adj.P.Val < 0.5 > table(treat1) > # 2 DE genes > > END CODE > ################################################################## > > According to the vignette and previous posts on this issue, specifying coef=4 should test for the difference between > control and stress samples, irrespective of the level of the second factor. > However, since in this design matrix, the reference cell is "genotypeA-treatmentControl", i have the impression that specifying > coef=4 in the glmLRT()-function tests for the effect of treatment specifically on genotypeA rather than for the treatment > effect irrespective of "genotype". >> From working through the package "contrast", i conclude that for testing for the effect of treatment averaged over genotype, > i need to specify a contrast as follows: > > ### BEGIN CODE > lrt.tmp1.trt.av <- glmLRT(tmp1,glmfit.tmp1,contrast=c(0,0,0,1,1/3,1/3)) > treat.av <- topTags(lrt.tmp1.trt.av,n=nrow(lrt.tmp1.trt.av)) > treat.av1 <- treat.av[[1]]$adj.P.Val < 0.5 > table(treat.av1) > # 9 DE genes > sessionInfo() > #R version 2.14.0 (2011-10-31) > #Platform: x86_64-pc-linux-gnu (64-bit) > # > #locale: > # [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C > # [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 > # [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8 > # [7] LC_PAPER=de_DE.UTF-8 LC_NAME=de_DE.UTF-8 > # [9] LC_ADDRESS=de_DE.UTF-8 LC_TELEPHONE=de_DE.UTF-8 > #[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=de_DE.UTF-8 > # > #attached base packages: > #[1] stats graphics grDevices utils datasets methods base > # > #other attached packages: > #[1] edgeR_2.4.3 limma_3.10.3 JGR_1.7-9 iplots_1.1-4 JavaGD_0.5-5 > #[6] rJava_0.9-3 > # > #loaded via a namespace (and not attached): > #[1] tools_2.14.0 > ### END CODE > > So, my question is, do i really test for the treatment effect irrespective of genotype by specifiying "coef=4" (=contrast=c(0,0,0,1,0,0)) or rather > for the effect of treatment for the reference genotype ("A"). In the latter case, what would then be the correct approach to test > for the effect of treatment irrespective of genotype? > In either case, did i get it right that the obtained p-Values indicate the significance of the effect rather than for the test that the coefficients > differ from zero? > > Thanks for your help, Henning > > > ---------------------------------------------- > Dr. Henning Wildhagen > Chair of Tree Physiology, University of Freiburg > Germany > -- ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

edgeR edgeR • 3.2k views

ADD COMMENT • link 12.1 years ago Gordon Smyth 50k