Entering edit mode
Henning Wildhagen
▴
30
@henning-wildhagen-5190
Last seen 10.6 years ago
Hi all,
i am analysing a two-factorial RNA-seq experiment using edgeR. The
design of my study has two factors, genotype and treatment.
Genotype has three levels (A,B,C), "treatment" has two levels
("control", "stress").
I have some questions regarding the usage of the "coef" and
"contrast"-argument using the glmLRT()-function as outlined below:
#####################################################################
BEGIN CODE
library(edgeR)
# Generate a dataframe of neg.binom counts
set.seed(50)
s <- as.data.frame(matrix(rnbinom(3000,mu=20,size=20),ncol=3))
u <- as.data.frame(matrix(rnbinom(3000,mu=30,size=25),ncol=3))
v <- as.data.frame(matrix(rnbinom(3000,mu=40,size=20),ncol=3))
w <- as.data.frame(matrix(rnbinom(3000,mu=50,size=25),ncol=3))
x <- as.data.frame(matrix(rnbinom(3000,mu=60,size=20),ncol=3))
y <- as.data.frame(matrix(rnbinom(3000,mu=70,size=25),ncol=3))
z <- as.data.frame(cbind(s,u,v,w,x,y))
names(z) <- c("A_Con1", "A_Con2","A_Con3","A_Str1", "A_Str2","A_Str3",
"B_Con1","B_Con2","B_Con3","B_Str1","B_Str2","B_Str3",
"C_Con1","C_Con2","C_Con3","C_Str1","C_Str2","C_Str3")
# Specify factors
genotype <- as.factor(c(rep("A",6),rep("B",6),rep("C",6)))
treatment <-as.factor(rep(c(rep("Control",3),rep("Stress",3)),3))
# DGEList-object
groups <- c("A_Con", "A_Con","A_Con","A_Str", "A_Str","A_Str",
"B_Con","B_Con","B_Con","B_Str","B_Str","B_Str",
"C_Con","C_Con","C_Con","C_Str","C_Str","C_Str")
tmp1 <- DGEList(counts=z,group=groups)
tmp1 <- calcNormFactors(tmp1)
# Design matrix
design <- model.matrix(~genotype*treatment)
# Estimating Cox-Reid tagwise dispersion
tmp2 <- getPriorN(tmp1, design=design,prior.df=20)
tmp1 <- estimateTagwiseDisp(tmp1,prior.n=tmp2,trend="movingave",prop.u
sed=0.3,grid.length=500)
summary(tmp1$tagwise.dispersion)
tmp1$common.disp
# Calling DE genes with the full model
glmfit.tmp1 <- glmFit(tmp1,design=design,dispersion=tmp1$tagwise.dispe
rsion,method="auto")
# Testing for the effect of "treatment"
lrt.tmp1.treat <- glmLRT(tmp1,glmfit.tmp1,coef=4) #
treat <- topTags(lrt.tmp1.treat,n=nrow(lrt.tmp1.treat))
treat1 <- treat[[1]]$adj.P.Val < 0.5
table(treat1)
# 2 DE genes
END CODE
##################################################################
According to the vignette and previous posts on this issue, specifying
coef=4 should test for the difference between
control and stress samples, irrespective of the level of the second
factor.
However, since in this design matrix, the reference cell is
"genotypeA-treatmentControl", i have the impression that specifying
coef=4 in the glmLRT()-function tests for the effect of treatment
specifically on genotypeA rather than for the treatment
effect irrespective of "genotype".
>From working through the package "contrast", i conclude that for
testing for the effect of treatment averaged over genotype,
i need to specify a contrast as follows:
### BEGIN CODE
lrt.tmp1.trt.av <-
glmLRT(tmp1,glmfit.tmp1,contrast=c(0,0,0,1,1/3,1/3))
treat.av <- topTags(lrt.tmp1.trt.av,n=nrow(lrt.tmp1.trt.av))
treat.av1 <- treat.av[[1]]$adj.P.Val < 0.5
table(treat.av1)
# 9 DE genes
sessionInfo()
#R version 2.14.0 (2011-10-31)
#Platform: x86_64-pc-linux-gnu (64-bit)
#
#locale:
# [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
# [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
# [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
# [7] LC_PAPER=de_DE.UTF-8 LC_NAME=de_DE.UTF-8
# [9] LC_ADDRESS=de_DE.UTF-8 LC_TELEPHONE=de_DE.UTF-8
#[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=de_DE.UTF-8
#
#attached base packages:
#[1] stats graphics grDevices utils datasets methods base
#
#other attached packages:
#[1] edgeR_2.4.3 limma_3.10.3 JGR_1.7-9 iplots_1.1-4 JavaGD_0.5-5
#[6] rJava_0.9-3
#
#loaded via a namespace (and not attached):
#[1] tools_2.14.0
### END CODE
So, my question is, do i really test for the treatment effect
irrespective of genotype by specifiying "coef=4"
(=contrast=c(0,0,0,1,0,0)) or rather
for the effect of treatment for the reference genotype ("A"). In the
latter case, what would then be the correct approach to test
for the effect of treatment irrespective of genotype?
In either case, did i get it right that the obtained p-Values indicate
the significance of the effect rather than for the test that the
coefficients
differ from zero?
Thanks for your help, Henning
----------------------------------------------
Dr. Henning Wildhagen
Chair of Tree Physiology, University of Freiburg
Germany
--