Search
Question: Where to include an interaction term in DESeq2
0
3.8 years ago by
United States
ksilkaitis0 wrote:

Hello, I have an experiment without replicates, with three treatments (white, blue, and untreated) and three genotypes (mutant, wild type, and another genotype we call "four"):

blue    wt
white    mt
blue    four
untr    wt
white    four
untr    mt
blue    mt
white    wt
untr    four

I want to find out whether i) the genotypes show differential expression; ii) if there are differentially expressed genes between treatments; iii) if genotype influences response to treatment. For this last question, I believe I want to include an interaction term. I'm not sure, however, where I need to specify this. My code so far is

sampleFiles<-c("KS1.read_count","KS2.read_count","KS3.read_count","KS4.read_count","KS5.read_count","KS6.read_count","KS7.read_count","KS8.read_count","KS9.read_count")
sampleCondition<-c("blue","white","blue","t0","white","t0","blue","white","t0")
sampleGenotype<-c("wt","mt","four","wt","four","mt","mt","wt","four")
sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition,type=sampleGenotype)
ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=getwd(), design=~type+condition)
colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("white","blue","t0"))
colData(ddsHTSeq)$type<-factor(colData(ddsHTSeq)$type, levels=c("four","wt","mt"))
dds<-DESeq(ddsHTSeq)


My first question is if I need to include the interaction term in the "design" argument of the "DESeqDataSetFromHTSeqCount" argument.

I began following the vignette for multi-factor designs and included an interaction term :

ddsMF <- dds
resMF <- results(ddsMF)
design(ddsMF) <- formula(~ type + condition + type:condition)
ddsMF <- DESeq(ddsMF)



I will contrast two genotypes first, and so use the following code:

resMFType <- results(ddsMF, contrast=c("type","mt","wt"))
head(resMFType)

I believe that in specifying the formula above, the variable of interest should be last, but I am interested in all three; do I specify a different design formula for each question, and would thus not include the interaction term when I want to contrast genotypes? And how does this "design" relate to the "design" in the "DESeqDataSetFromHTSeqCount" argument?

Thank you for your help,

Katherine

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] DESeq2_1.6.3            RcppArmadillo_0.4.600.0 Rcpp_0.11.3             GenomicRanges_1.18.4    GenomeInfoDb_1.2.4
[6] IRanges_2.0.1           S4Vectors_0.4.0         BiocGenerics_0.12.1

loaded via a namespace (and not attached):
[1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.1 base64enc_0.1-2      BatchJobs_1.5
[6] BBmisc_1.8           Biobase_2.26.0       BiocParallel_1.0.0   brew_1.0-6           checkmate_1.5.1
[11] cluster_1.15.3       codetools_0.2-9      colorspace_1.2-4     DBI_0.3.1            digest_0.6.8
[16] fail_1.2             foreach_1.4.2        foreign_0.8-61       Formula_1.1-2        genefilter_1.48.1
[21] geneplotter_1.44.0   ggplot2_1.0.0        grid_3.1.2           gtable_0.1.2         Hmisc_3.14-6
[26] iterators_1.0.7      lattice_0.20-29      latticeExtra_0.6-26  locfit_1.5-9.1       MASS_7.3-35
[31] munsell_0.4.2        nnet_7.3-8           plyr_1.8.1           proto_0.3-10         RColorBrewer_1.1-2
[36] reshape2_1.4.1       rpart_4.1-8          RSQLite_1.0.0        scales_0.2.4         sendmailR_1.2-1
[41] splines_3.1.2        stringr_0.6.2        survival_2.37-7      tools_3.1.2          XML_3.98-1.1
[46] xtable_1.7-4         XVector_0.6.0       
modified 3.8 years ago by Michael Love20k • written 3.8 years ago by ksilkaitis0
0
3.8 years ago by
Michael Love20k
United States
Michael Love20k wrote:

"I believe that in specifying the formula above, the variable of interest should be last, but I am interested in all three; do I specify a different design formula for each question, and would thus not include the interaction term when I want to contrast genotypes? And how does this "design" relate to the "design" in the "DESeqDataSetFromHTSeqCount" argument?"

hi Katherine,

The order of the variables only matters for when users run the functions without any arguments, e.g.: results(dds) or plotMA(dds). In this case the software has to pick a variable to make a results table for, or for making the plot, so it uses the last variable, and the last factor level over the first factor level. However, you can use the 'contrast' or 'name' argument to results() to build any comparison, and the order of variables in the design will not matter.

You can change the design after object construction, but you should not change the design after running DESeq(), because the design is used in the DESeq() steps to run the model. So there isn't a point in building the object with one design and then immediately changing to a different design. And it's not such a good idea, because there are steps during object construction and helpful messages about your design.

An issue with the interaction model is that you don't have replicates to test if genotype influences the response to treatment: you only have one sample per unique combination of genotype x treatment. I'd recommend sticking to questions (i) and (ii) using the 'contrast' argument of results() with a design of ~ genotype + treatment. Or you could increase the replication in the experiment.

Thank you for your response, especially pointing out the difficulty of determining if there is an interaction if there aren't replicates. If I did have replicates, can you clarify if I would include an interaction term in the "design" argument of the "DESeqDataSetFromHTSeqCount" function, as well as in design(ddsMF) <- formula(~ type + condition + type:condition)? I'm unclear if the "DESeqDataSetFromHTSeqCount" function is literally only for experimental design (with only two variables) or if it should be the model design. If it's the latter, I believe design(ddsMF) <- formula(~ type + condition + type:condition) would then be redundant.

Thanks again,

Katherine