Entering edit mode
Dear Henning,
Thanks for including reproducible code. I think however that you
might be
taking comments that were made in the edgeR User's Guide about an
additive
model and trying to apply them to an interaction model. Let me
clarify.
Your experiment has three genotypes and one treatment (stress vs
control).
When we analyse a two factor study, the first question we have to
consider is whether it makes sense to assume that the treatment effect
will be the same for each genotype. If the answer is yes, then you
would
fit an additive linear model:
design <- model.matrix(~ genotype + treatment)
This model has just four coefficients. The 4th coefficient would
indeed
test for a treatment effect irrespective of genotype. This type of
analysis is done in the last two case studies of the edgeR User's
Guide:
http://bioconductor.org/packages/2.10/bioc/vignettes/edgeR/inst/doc/ed
geRUsersGuide.pdf
This type of analysis is appropriate when the non-treatment factor is
a
blocking or nuisance variable that is not of primary scientific
interest,
for example a batch effect.
In your hypothesized study, however, your non-treatment factor is
genotype. Usually one would design a study of this type to see
whether
the treatment effect differs between genotypes. So you need to fit a
model that allows the treatment effects to be different between
genotypes,
for example
Model 1: ~ genotype * treatment
which is equivalent to ~genotype+treatment+genotype:treatment, or
Model 2: ~ genotype + genotype:treatment
These models are all equivalent, except for parametrization, and all
have
six coefficients. With these model you cannot test for "treatment
effect
irrespective of genotype", because there is no such thing. You are
making
the assumption that the treatment effect may depends on genotype,
hence it
makes no scientific sense to talk about a treatment effect without
regard
to the genotype it applies to.
If you want to know which genes have a treatment effect in genotype A,
you
would fit Model 2 and test for coef=4. For treatment effect in
genotype
B, test for coef=5, and for treatment effect in genotype C, test for
coef=6.
If you want to know which genes have treatment effects that differ
between
the genotypes, then you would test for interaction. You do this by
fitting Model 1 and testing for "coef=4:6". Not that this is one
test,
not three tests.
Best wishes
Gordon
> Date: Wed, 28 Mar 2012 15:11:59 +0200
> From: "Henning Wildhagen" <hwildhagen at="" gmx.de="">
> To: bioconductor at r-project.org
> Subject: [BioC] edgeR: Specifying the "coef"-argument in glmLRT in a
> two factor study
>
> Hi all,
>
> i am analysing a two-factorial RNA-seq experiment using edgeR. The
design of my study has two factors, genotype and treatment.
> Genotype has three levels (A,B,C), "treatment" has two levels
("control", "stress").
> I have some questions regarding the usage of the "coef" and
"contrast"-argument using the glmLRT()-function as outlined below:
>
>
#####################################################################
> BEGIN CODE
> library(edgeR)
> # Generate a dataframe of neg.binom counts
> set.seed(50)
> s <- as.data.frame(matrix(rnbinom(3000,mu=20,size=20),ncol=3))
> u <- as.data.frame(matrix(rnbinom(3000,mu=30,size=25),ncol=3))
> v <- as.data.frame(matrix(rnbinom(3000,mu=40,size=20),ncol=3))
> w <- as.data.frame(matrix(rnbinom(3000,mu=50,size=25),ncol=3))
> x <- as.data.frame(matrix(rnbinom(3000,mu=60,size=20),ncol=3))
> y <- as.data.frame(matrix(rnbinom(3000,mu=70,size=25),ncol=3))
> z <- as.data.frame(cbind(s,u,v,w,x,y))
> names(z) <- c("A_Con1", "A_Con2","A_Con3","A_Str1",
"A_Str2","A_Str3",
> "B_Con1","B_Con2","B_Con3","B_Str1","B_Str2","B_Str3",
> "C_Con1","C_Con2","C_Con3","C_Str1","C_Str2","C_Str3")
>
> # Specify factors
> genotype <- as.factor(c(rep("A",6),rep("B",6),rep("C",6)))
> treatment <-as.factor(rep(c(rep("Control",3),rep("Stress",3)),3))
>
> # DGEList-object
> groups <- c("A_Con", "A_Con","A_Con","A_Str", "A_Str","A_Str",
> "B_Con","B_Con","B_Con","B_Str","B_Str","B_Str",
> "C_Con","C_Con","C_Con","C_Str","C_Str","C_Str")
>
> tmp1 <- DGEList(counts=z,group=groups)
> tmp1 <- calcNormFactors(tmp1)
>
> # Design matrix
> design <- model.matrix(~genotype*treatment)
>
> # Estimating Cox-Reid tagwise dispersion
> tmp2 <- getPriorN(tmp1, design=design,prior.df=20)
> tmp1 <- estimateTagwiseDisp(tmp1,prior.n=tmp2,trend="movingave",prop
.used=0.3,grid.length=500)
> summary(tmp1$tagwise.dispersion)
> tmp1$common.disp
>
> # Calling DE genes with the full model
> glmfit.tmp1 <- glmFit(tmp1,design=design,dispersion=tmp1$tagwise.dis
persion,method="auto")
>
> # Testing for the effect of "treatment"
> lrt.tmp1.treat <- glmLRT(tmp1,glmfit.tmp1,coef=4) #
> treat <- topTags(lrt.tmp1.treat,n=nrow(lrt.tmp1.treat))
> treat1 <- treat[[1]]$adj.P.Val < 0.5
> table(treat1)
> # 2 DE genes
>
> END CODE
> ##################################################################
>
> According to the vignette and previous posts on this issue,
specifying coef=4 should test for the difference between
> control and stress samples, irrespective of the level of the second
factor.
> However, since in this design matrix, the reference cell is
"genotypeA-treatmentControl", i have the impression that specifying
> coef=4 in the glmLRT()-function tests for the effect of treatment
specifically on genotypeA rather than for the treatment
> effect irrespective of "genotype".
>> From working through the package "contrast", i conclude that for
testing for the effect of treatment averaged over genotype,
> i need to specify a contrast as follows:
>
> ### BEGIN CODE
> lrt.tmp1.trt.av <-
glmLRT(tmp1,glmfit.tmp1,contrast=c(0,0,0,1,1/3,1/3))
> treat.av <- topTags(lrt.tmp1.trt.av,n=nrow(lrt.tmp1.trt.av))
> treat.av1 <- treat.av[[1]]$adj.P.Val < 0.5
> table(treat.av1)
> # 9 DE genes
> sessionInfo()
> #R version 2.14.0 (2011-10-31)
> #Platform: x86_64-pc-linux-gnu (64-bit)
> #
> #locale:
> # [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
> # [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
> # [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
> # [7] LC_PAPER=de_DE.UTF-8 LC_NAME=de_DE.UTF-8
> # [9] LC_ADDRESS=de_DE.UTF-8 LC_TELEPHONE=de_DE.UTF-8
> #[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=de_DE.UTF-8
> #
> #attached base packages:
> #[1] stats graphics grDevices utils datasets methods
base
> #
> #other attached packages:
> #[1] edgeR_2.4.3 limma_3.10.3 JGR_1.7-9 iplots_1.1-4
JavaGD_0.5-5
> #[6] rJava_0.9-3
> #
> #loaded via a namespace (and not attached):
> #[1] tools_2.14.0
> ### END CODE
>
> So, my question is, do i really test for the treatment effect
irrespective of genotype by specifiying "coef=4"
(=contrast=c(0,0,0,1,0,0)) or rather
> for the effect of treatment for the reference genotype ("A"). In the
latter case, what would then be the correct approach to test
> for the effect of treatment irrespective of genotype?
> In either case, did i get it right that the obtained p-Values
indicate the significance of the effect rather than for the test that
the coefficients
> differ from zero?
>
> Thanks for your help, Henning
>
>
> ----------------------------------------------
> Dr. Henning Wildhagen
> Chair of Tree Physiology, University of Freiburg
> Germany
> --
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}