DESeq for two-factor models
0
0
Entering edit mode
@henning-wildhagen-5190
Last seen 9.6 years ago
Hi all, i have some questions regarding a two-factor-glm using DESeq. The design of my study has two factors, genotype and treatment. Genotype has three levels (A,B,C), "treatment" has two levels ("control", "stress"). I am interested in the following questions: 1. Which genes are affected by "treatment"? 2. Which genes are affected by "genotype"? 3. Which genes are affected by both, "treatment" and "genotype"? 4. Which genes are affected by an interaction of "treatment" and "genotype"? Pairwise comparisons: 5. Which genes are affected by "treatment" in a particular genotype? 6. Which genes are affected by "genotype" in a particular treatment There are some posts concerning similar questions in the archive, e.g. https://stat.ethz.ch/pipermail/bioconductor/2012-January/042823.html However, it is still not clear to me how to correctly address these questions using DESeq. Let me delineate my problems using some code: ###### Start Code library(DESeq) # Generate a dataframe of neg.binom counts set.seed(50) s <- as.data.frame(matrix(rnbinom(750,mu=20,size=20),ncol=3)) u <- as.data.frame(matrix(rnbinom(750,mu=30,size=25),ncol=3)) v <- as.data.frame(matrix(rnbinom(750,mu=40,size=20),ncol=3)) w <- as.data.frame(matrix(rnbinom(750,mu=50,size=25),ncol=3)) x <- as.data.frame(matrix(rnbinom(750,mu=60,size=20),ncol=3)) y <- as.data.frame(matrix(rnbinom(750,mu=70,size=25),ncol=3)) z <- as.data.frame(cbind(s,u,v,w,x,y)) names(z) <- c("A_Con1", "A_Con2","A_Con3","A_Str1", "A_Str2","A_Str3", "B_Con1","B_Con2","B_Con3","B_Str1","B_Str2","B_Str3", "C_Con1","C_Con2","C_Con3","C_Str1","C_Str2","C_Str3") # Design matrix genotype <- c(rep("A",6),rep("B",6),rep("C",6)) treatment <-rep(c(rep("Control",3),rep("Stress",3)),3) Design <- as.data.frame(cbind(genotype,treatment),row.names=colnames(z)) # Creating the CountDataSet-Object CDS <- newCountDataSet(z,Design) ### Calculating size factors CDS <- estimateSizeFactors(CDS) ### Estimate dispersion CDS <- estimateDispersions(CDS,method="pooled") ### Models fit0 <- fitNbinomGLMs(CDS,count~1) fit1 <- fitNbinomGLMs(CDS,count~treatment) fit2 <- fitNbinomGLMs(CDS,count~genotype) fit3 <- fitNbinomGLMs(CDS,count~treatment+genotype) fit4 <- fitNbinomGLMs(CDS,count~treatment*genotype) ### Inference # # 1. Which genes are affected by treatment? Pvals_Trt <- nbinomGLMTest(fit1,fit0) Padj_Trt <- p.adjust(Pvals_Trt,method="BH") tmp1 <- Padj_Trt < 0.3 # arbitrary value to get some affected genes length(Padj_Trt[tmp1==TRUE]) # There is one gene affected by treatment # 2. Which genes are affected by genotype? Pvals_gen <- nbinomGLMTest(fit2,fit0) Padj_gen <- p.adjust(Pvals_gen,method="BH") tmp2 <- Padj_gen < 0.3 length(Padj_gen[tmp2==TRUE]) # 2 genes are affected by treatment # 3. Which genes are affected by both, treatment and genotype? #In this case, the full model is "fit3", but what is the correct reduced model, #"fit0", "fit1" or "fit2"? #Since I expect a stronger effect of "treatment" compared to "genotype" in my #real data, i decided to go on with to test for an additional effect of #"genotype" on genes affected by "treatment": Pvals <- nbinomGLMTest(fit3,fit1) Padj <- p.adjust(Pvals,method="BH") tmp3 <- Padj < 0.3 length(Padj[tmp3==TRUE]) # for 2 genes, there is an additional effect of "genotype" # 4. Which genes are affected by an interaction of "treatment" and "genotype"? # Intuitively, i would test for this by comparing "fit3" and "fit4", but again # I am not absolutely sure that "fit3" is the # correct choice for the reduced model? Pvals_int <- nbinomGLMTest(fit4,fit3) Padj_int <- p.adjust(Pvals_int,method="BH") tmp4 <- Padj_int < 0.3 length(Padj_int[tmp4==TRUE]) # no gene is affected by an interaction of both factors sessionInfo() #R version 2.14.0 (2011-10-31) #Platform: x86_64-pc-linux-gnu (64-bit) # #locale: # [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C # [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 # [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8 # [7] LC_PAPER=de_DE.UTF-8 LC_NAME=de_DE.UTF-8 # [9] LC_ADDRESS=de_DE.UTF-8 LC_TELEPHONE=de_DE.UTF-8 #[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=de_DE.UTF-8 # #attached base packages: #[1] stats graphics grDevices utils datasets methods base # #other attached packages: #[1] DESeq_1.6.1 locfit_1.5-6 lattice_0.20-0 akima_0.5-7 Biobase_2.14.0 #[6] JGR_1.7-9 iplots_1.1-4 JavaGD_0.5-5 rJava_0.9-3 # #loaded via a namespace (and not attached): # [1] annotate_1.32.1 AnnotationDbi_1.16.18 DBI_0.2-5 # [4] genefilter_1.36.0 geneplotter_1.32.1 grid_2.14.0 # [7] IRanges_1.12.6 RColorBrewer_1.0-5 RSQLite_0.11.1 #[10] splines_2.14.0 survival_2.36-12 tools_2.14.0 #[13] xtable_1.7-0 ##### END code Now, since both factors have an effect on some genes, i would like to know: 5. Which genes are affected by treatment in a particular genotype, say genotype "A"? In modelling approach, this question is addressed by selecting contrasts or coefficient from the model matrix. In the vignette of DESeq, there is an example of a two-factorial design, but it is not clear to me whether DESeq has implemented an option to specify contrasts. If this is not the case, then the only way to answer question number 5 would be to run DESeq genotype-wise and check for treatment-effects in one-factor models (separate models per genotype)? Thanks for your comments and help, Henning ---------------------------------------------- Dr. Henning Wildhagen Chair of Tree Physiology, University of Freiburg, Germany -- Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a
GO DESeq GO DESeq • 2.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6