Question

edgeR: Specifying the "coef"-argument in glmLRT in a two factor study

0

Entering edit mode

Henning Wildhagen ▴ 30

@henning-wildhagen-5190

Last seen 9.6 years ago

Hi all, i am analysing a two-factorial RNA-seq experiment using edgeR. The design of my study has two factors, genotype and treatment. Genotype has three levels (A,B,C), "treatment" has two levels ("control", "stress"). I have some questions regarding the usage of the "coef" and "contrast"-argument using the glmLRT()-function as outlined below: ##################################################################### BEGIN CODE library(edgeR) # Generate a dataframe of neg.binom counts set.seed(50) s <- as.data.frame(matrix(rnbinom(3000,mu=20,size=20),ncol=3)) u <- as.data.frame(matrix(rnbinom(3000,mu=30,size=25),ncol=3)) v <- as.data.frame(matrix(rnbinom(3000,mu=40,size=20),ncol=3)) w <- as.data.frame(matrix(rnbinom(3000,mu=50,size=25),ncol=3)) x <- as.data.frame(matrix(rnbinom(3000,mu=60,size=20),ncol=3)) y <- as.data.frame(matrix(rnbinom(3000,mu=70,size=25),ncol=3)) z <- as.data.frame(cbind(s,u,v,w,x,y)) names(z) <- c("A_Con1", "A_Con2","A_Con3","A_Str1", "A_Str2","A_Str3", "B_Con1","B_Con2","B_Con3","B_Str1","B_Str2","B_Str3", "C_Con1","C_Con2","C_Con3","C_Str1","C_Str2","C_Str3") # Specify factors genotype <- as.factor(c(rep("A",6),rep("B",6),rep("C",6))) treatment <-as.factor(rep(c(rep("Control",3),rep("Stress",3)),3)) # DGEList-object groups <- c("A_Con", "A_Con","A_Con","A_Str", "A_Str","A_Str", "B_Con","B_Con","B_Con","B_Str","B_Str","B_Str", "C_Con","C_Con","C_Con","C_Str","C_Str","C_Str") tmp1 <- DGEList(counts=z,group=groups) tmp1 <- calcNormFactors(tmp1) # Design matrix design <- model.matrix(~genotype*treatment) # Estimating Cox-Reid tagwise dispersion tmp2 <- getPriorN(tmp1, design=design,prior.df=20) tmp1 <- estimateTagwiseDisp(tmp1,prior.n=tmp2,trend="movingave",prop.u sed=0.3,grid.length=500) summary(tmp1$tagwise.dispersion) tmp1$common.disp # Calling DE genes with the full model glmfit.tmp1 <- glmFit(tmp1,design=design,dispersion=tmp1$tagwise.dispe rsion,method="auto") # Testing for the effect of "treatment" lrt.tmp1.treat <- glmLRT(tmp1,glmfit.tmp1,coef=4) # treat <- topTags(lrt.tmp1.treat,n=nrow(lrt.tmp1.treat)) treat1 <- treat[[1]]$adj.P.Val < 0.5 table(treat1) # 2 DE genes END CODE ################################################################## According to the vignette and previous posts on this issue, specifying coef=4 should test for the difference between control and stress samples, irrespective of the level of the second factor. However, since in this design matrix, the reference cell is "genotypeA-treatmentControl", i have the impression that specifying coef=4 in the glmLRT()-function tests for the effect of treatment specifically on genotypeA rather than for the treatment effect irrespective of "genotype". >From working through the package "contrast", i conclude that for testing for the effect of treatment averaged over genotype, i need to specify a contrast as follows: ### BEGIN CODE lrt.tmp1.trt.av <- glmLRT(tmp1,glmfit.tmp1,contrast=c(0,0,0,1,1/3,1/3)) treat.av <- topTags(lrt.tmp1.trt.av,n=nrow(lrt.tmp1.trt.av)) treat.av1 <- treat.av[[1]]$adj.P.Val < 0.5 table(treat.av1) # 9 DE genes sessionInfo() #R version 2.14.0 (2011-10-31) #Platform: x86_64-pc-linux-gnu (64-bit) # #locale: # [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C # [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 # [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8 # [7] LC_PAPER=de_DE.UTF-8 LC_NAME=de_DE.UTF-8 # [9] LC_ADDRESS=de_DE.UTF-8 LC_TELEPHONE=de_DE.UTF-8 #[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=de_DE.UTF-8 # #attached base packages: #[1] stats graphics grDevices utils datasets methods base # #other attached packages: #[1] edgeR_2.4.3 limma_3.10.3 JGR_1.7-9 iplots_1.1-4 JavaGD_0.5-5 #[6] rJava_0.9-3 # #loaded via a namespace (and not attached): #[1] tools_2.14.0 ### END CODE So, my question is, do i really test for the treatment effect irrespective of genotype by specifiying "coef=4" (=contrast=c(0,0,0,1,0,0)) or rather for the effect of treatment for the reference genotype ("A"). In the latter case, what would then be the correct approach to test for the effect of treatment irrespective of genotype? In either case, did i get it right that the obtained p-Values indicate the significance of the effect rather than for the test that the coefficients differ from zero? Thanks for your help, Henning ---------------------------------------------- Dr. Henning Wildhagen Chair of Tree Physiology, University of Freiburg Germany --

edgeR edgeR • 1.7k views

ADD COMMENT • link 12.1 years ago Henning Wildhagen ▴ 30