Entering edit mode
Henning Wildhagen
▴
30
@henning-wildhagen-5190
Last seen 9.6 years ago
Hi all,
i have some questions regarding a two-factor-glm using DESeq. The
design of my study has two factors, genotype and treatment. Genotype
has three levels (A,B,C), "treatment" has two levels ("control",
"stress").
I am interested in the following questions:
1. Which genes are affected by "treatment"?
2. Which genes are affected by "genotype"?
3. Which genes are affected by both, "treatment" and "genotype"?
4. Which genes are affected by an interaction of "treatment" and
"genotype"?
Pairwise comparisons:
5. Which genes are affected by "treatment" in a particular genotype?
6. Which genes are affected by "genotype" in a particular treatment
There are some posts concerning similar questions in the archive, e.g.
https://stat.ethz.ch/pipermail/bioconductor/2012-January/042823.html
However, it is still not clear to me how to correctly address these
questions using DESeq.
Let me delineate my problems using some code:
###### Start Code
library(DESeq)
# Generate a dataframe of neg.binom counts
set.seed(50)
s <- as.data.frame(matrix(rnbinom(750,mu=20,size=20),ncol=3))
u <- as.data.frame(matrix(rnbinom(750,mu=30,size=25),ncol=3))
v <- as.data.frame(matrix(rnbinom(750,mu=40,size=20),ncol=3))
w <- as.data.frame(matrix(rnbinom(750,mu=50,size=25),ncol=3))
x <- as.data.frame(matrix(rnbinom(750,mu=60,size=20),ncol=3))
y <- as.data.frame(matrix(rnbinom(750,mu=70,size=25),ncol=3))
z <- as.data.frame(cbind(s,u,v,w,x,y))
names(z) <- c("A_Con1", "A_Con2","A_Con3","A_Str1", "A_Str2","A_Str3",
"B_Con1","B_Con2","B_Con3","B_Str1","B_Str2","B_Str3",
"C_Con1","C_Con2","C_Con3","C_Str1","C_Str2","C_Str3")
# Design matrix
genotype <- c(rep("A",6),rep("B",6),rep("C",6))
treatment <-rep(c(rep("Control",3),rep("Stress",3)),3)
Design <-
as.data.frame(cbind(genotype,treatment),row.names=colnames(z))
# Creating the CountDataSet-Object
CDS <- newCountDataSet(z,Design)
### Calculating size factors
CDS <- estimateSizeFactors(CDS)
### Estimate dispersion
CDS <- estimateDispersions(CDS,method="pooled")
### Models
fit0 <- fitNbinomGLMs(CDS,count~1)
fit1 <- fitNbinomGLMs(CDS,count~treatment)
fit2 <- fitNbinomGLMs(CDS,count~genotype)
fit3 <- fitNbinomGLMs(CDS,count~treatment+genotype)
fit4 <- fitNbinomGLMs(CDS,count~treatment*genotype)
### Inference
#
# 1. Which genes are affected by treatment?
Pvals_Trt <- nbinomGLMTest(fit1,fit0)
Padj_Trt <- p.adjust(Pvals_Trt,method="BH")
tmp1 <- Padj_Trt < 0.3 # arbitrary value to get some affected genes
length(Padj_Trt[tmp1==TRUE])
# There is one gene affected by treatment
# 2. Which genes are affected by genotype?
Pvals_gen <- nbinomGLMTest(fit2,fit0)
Padj_gen <- p.adjust(Pvals_gen,method="BH")
tmp2 <- Padj_gen < 0.3
length(Padj_gen[tmp2==TRUE])
# 2 genes are affected by treatment
# 3. Which genes are affected by both, treatment and genotype?
#In this case, the full model is "fit3", but what is the correct
reduced model, #"fit0", "fit1" or "fit2"?
#Since I expect a stronger effect of "treatment" compared to
"genotype" in my #real data, i decided to go on with to test for an
additional effect of #"genotype" on genes affected by "treatment":
Pvals <- nbinomGLMTest(fit3,fit1)
Padj <- p.adjust(Pvals,method="BH")
tmp3 <- Padj < 0.3
length(Padj[tmp3==TRUE])
# for 2 genes, there is an additional effect of "genotype"
# 4. Which genes are affected by an interaction of "treatment" and
"genotype"?
# Intuitively, i would test for this by comparing "fit3" and "fit4",
but again # I am not absolutely sure that "fit3" is the
# correct choice for the reduced model?
Pvals_int <- nbinomGLMTest(fit4,fit3)
Padj_int <- p.adjust(Pvals_int,method="BH")
tmp4 <- Padj_int < 0.3
length(Padj_int[tmp4==TRUE])
# no gene is affected by an interaction of both factors
sessionInfo()
#R version 2.14.0 (2011-10-31)
#Platform: x86_64-pc-linux-gnu (64-bit)
#
#locale:
# [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
# [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8
# [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
# [7] LC_PAPER=de_DE.UTF-8 LC_NAME=de_DE.UTF-8
# [9] LC_ADDRESS=de_DE.UTF-8 LC_TELEPHONE=de_DE.UTF-8
#[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=de_DE.UTF-8
#
#attached base packages:
#[1] stats graphics grDevices utils datasets methods base
#
#other attached packages:
#[1] DESeq_1.6.1 locfit_1.5-6 lattice_0.20-0 akima_0.5-7
Biobase_2.14.0
#[6] JGR_1.7-9 iplots_1.1-4 JavaGD_0.5-5 rJava_0.9-3
#
#loaded via a namespace (and not attached):
# [1] annotate_1.32.1 AnnotationDbi_1.16.18 DBI_0.2-5
# [4] genefilter_1.36.0 geneplotter_1.32.1 grid_2.14.0
# [7] IRanges_1.12.6 RColorBrewer_1.0-5 RSQLite_0.11.1
#[10] splines_2.14.0 survival_2.36-12 tools_2.14.0
#[13] xtable_1.7-0
##### END code
Now, since both factors have an effect on some genes, i would like to
know:
5. Which genes are affected by treatment in a particular genotype, say
genotype "A"?
In modelling approach, this question is addressed by selecting
contrasts or coefficient from the model matrix. In the
vignette of DESeq, there is an example of a two-factorial design, but
it is not clear to me whether DESeq has implemented an option to
specify contrasts.
If this is not the case, then the only way to answer question number 5
would be to run DESeq genotype-wise and check for treatment-effects
in one-factor models (separate models per genotype)?
Thanks for your comments and help,
Henning
----------------------------------------------
Dr. Henning Wildhagen
Chair of Tree Physiology, University of Freiburg, Germany
--
Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a