Hello, I have an experiment without replicates, with three treatments (white, blue, and untreated) and three genotypes (mutant, wild type, and another genotype we call "four"):
blue wt
white mt
blue four
untr wt
white four
untr mt
blue mt
white wt
untr four
I want to find out whether i) the genotypes show differential expression; ii) if there are differentially expressed genes between treatments; iii) if genotype influences response to treatment. For this last question, I believe I want to include an interaction term. I'm not sure, however, where I need to specify this. My code so far is
sampleFiles<-c("KS1.read_count","KS2.read_count","KS3.read_count","KS4.read_count","KS5.read_count","KS6.read_count","KS7.read_count","KS8.read_count","KS9.read_count") sampleCondition<-c("blue","white","blue","t0","white","t0","blue","white","t0") sampleGenotype<-c("wt","mt","four","wt","four","mt","mt","wt","four") sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition,type=sampleGenotype) ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=getwd(), design=~type+condition) colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("white","blue","t0")) colData(ddsHTSeq)$type<-factor(colData(ddsHTSeq)$type, levels=c("four","wt","mt")) dds<-DESeq(ddsHTSeq) res<-results(dds)[order(results(dds)$padj),]
My first question is if I need to include the interaction term in the "design" argument of the "DESeqDataSetFromHTSeqCount" argument.
I began following the vignette for multi-factor designs and included an interaction term :
ddsMF <- dds resMF <- results(ddsMF) design(ddsMF) <- formula(~ type + condition + type:condition) ddsMF <- DESeq(ddsMF)
I will contrast two genotypes first, and so use the following code:
resMFType <- results(ddsMF, contrast=c("type","mt","wt")) head(resMFType)
I believe that in specifying the formula above, the variable of interest should be last, but I am interested in all three; do I specify a different design formula for each question, and would thus not include the interaction term when I want to contrast genotypes? And how does this "design" relate to the "design" in the "DESeqDataSetFromHTSeqCount" argument?
Thank you for your help,
Katherine
> sessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] DESeq2_1.6.3 RcppArmadillo_0.4.600.0 Rcpp_0.11.3 GenomicRanges_1.18.4 GenomeInfoDb_1.2.4 [6] IRanges_2.0.1 S4Vectors_0.4.0 BiocGenerics_0.12.1 loaded via a namespace (and not attached): [1] acepack_1.3-3.3 annotate_1.44.0 AnnotationDbi_1.28.1 base64enc_0.1-2 BatchJobs_1.5 [6] BBmisc_1.8 Biobase_2.26.0 BiocParallel_1.0.0 brew_1.0-6 checkmate_1.5.1 [11] cluster_1.15.3 codetools_0.2-9 colorspace_1.2-4 DBI_0.3.1 digest_0.6.8 [16] fail_1.2 foreach_1.4.2 foreign_0.8-61 Formula_1.1-2 genefilter_1.48.1 [21] geneplotter_1.44.0 ggplot2_1.0.0 grid_3.1.2 gtable_0.1.2 Hmisc_3.14-6 [26] iterators_1.0.7 lattice_0.20-29 latticeExtra_0.6-26 locfit_1.5-9.1 MASS_7.3-35 [31] munsell_0.4.2 nnet_7.3-8 plyr_1.8.1 proto_0.3-10 RColorBrewer_1.1-2 [36] reshape2_1.4.1 rpart_4.1-8 RSQLite_1.0.0 scales_0.2.4 sendmailR_1.2-1 [41] splines_3.1.2 stringr_0.6.2 survival_2.37-7 tools_3.1.2 XML_3.98-1.1 [46] xtable_1.7-4 XVector_0.6.0
Thank you for your response, especially pointing out the difficulty of determining if there is an interaction if there aren't replicates. If I did have replicates, can you clarify if I would include an interaction term in the "design" argument of the "DESeqDataSetFromHTSeqCount" function, as well as in
design(ddsMF) <- formula(~ type + condition + type:condition)
? I'm unclear if the "DESeqDataSetFromHTSeqCount" function is literally only for experimental design (with only two variables) or if it should be the model design. If it's the latter, I believedesign(ddsMF) <- formula(~ type + condition + type:condition)
would then be redundant.Thanks again,
Katherine
You should supply the design to DESeqDataSetFromHTSeqCount only. You don't need to use "design(dds) <-" at all (it does nothing but swap out the design so if you supply the same twice, it does nothing).