Where to include an interaction term in DESeq2
1
0
Entering edit mode
ksilkaitis • 0
@ksilkaitis-7281
Last seen 9.3 years ago
United States

Hello, I have an experiment without replicates, with three treatments (white, blue, and untreated) and three genotypes (mutant, wild type, and another genotype we call "four"):

blue    wt
white    mt
blue    four
untr    wt
white    four
untr    mt
blue    mt
white    wt
untr    four

I want to find out whether i) the genotypes show differential expression; ii) if there are differentially expressed genes between treatments; iii) if genotype influences response to treatment. For this last question, I believe I want to include an interaction term. I'm not sure, however, where I need to specify this. My code so far is

sampleFiles<-c("KS1.read_count","KS2.read_count","KS3.read_count","KS4.read_count","KS5.read_count","KS6.read_count","KS7.read_count","KS8.read_count","KS9.read_count")
sampleCondition<-c("blue","white","blue","t0","white","t0","blue","white","t0")
sampleGenotype<-c("wt","mt","four","wt","four","mt","mt","wt","four")
sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition,type=sampleGenotype)
ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=getwd(), design=~type+condition)
colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("white","blue","t0"))
colData(ddsHTSeq)$type<-factor(colData(ddsHTSeq)$type, levels=c("four","wt","mt"))
dds<-DESeq(ddsHTSeq)
res<-results(dds)[order(results(dds)$padj),]

My first question is if I need to include the interaction term in the "design" argument of the "DESeqDataSetFromHTSeqCount" argument.

I began following the vignette for multi-factor designs and included an interaction term :

ddsMF <- dds
resMF <- results(ddsMF)
design(ddsMF) <- formula(~ type + condition + type:condition)
ddsMF <- DESeq(ddsMF)

I will contrast two genotypes first, and so use the following code:

resMFType <- results(ddsMF, contrast=c("type","mt","wt"))
head(resMFType)

I believe that in specifying the formula above, the variable of interest should be last, but I am interested in all three; do I specify a different design formula for each question, and would thus not include the interaction term when I want to contrast genotypes? And how does this "design" relate to the "design" in the "DESeqDataSetFromHTSeqCount" argument?

Thank you for your help,

Katherine

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DESeq2_1.6.3            RcppArmadillo_0.4.600.0 Rcpp_0.11.3             GenomicRanges_1.18.4    GenomeInfoDb_1.2.4     
[6] IRanges_2.0.1           S4Vectors_0.4.0         BiocGenerics_0.12.1    

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.1 base64enc_0.1-2      BatchJobs_1.5       
 [6] BBmisc_1.8           Biobase_2.26.0       BiocParallel_1.0.0   brew_1.0-6           checkmate_1.5.1     
[11] cluster_1.15.3       codetools_0.2-9      colorspace_1.2-4     DBI_0.3.1            digest_0.6.8        
[16] fail_1.2             foreach_1.4.2        foreign_0.8-61       Formula_1.1-2        genefilter_1.48.1   
[21] geneplotter_1.44.0   ggplot2_1.0.0        grid_3.1.2           gtable_0.1.2         Hmisc_3.14-6        
[26] iterators_1.0.7      lattice_0.20-29      latticeExtra_0.6-26  locfit_1.5-9.1       MASS_7.3-35         
[31] munsell_0.4.2        nnet_7.3-8           plyr_1.8.1           proto_0.3-10         RColorBrewer_1.1-2  
[36] reshape2_1.4.1       rpart_4.1-8          RSQLite_1.0.0        scales_0.2.4         sendmailR_1.2-1     
[41] splines_3.1.2        stringr_0.6.2        survival_2.37-7      tools_3.1.2          XML_3.98-1.1        
[46] xtable_1.7-4         XVector_0.6.0       
deseq2 contrast design • 3.4k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 5 hours ago
United States

"I believe that in specifying the formula above, the variable of interest should be last, but I am interested in all three; do I specify a different design formula for each question, and would thus not include the interaction term when I want to contrast genotypes? And how does this "design" relate to the "design" in the "DESeqDataSetFromHTSeqCount" argument?"

hi Katherine,

The order of the variables only matters for when users run the functions without any arguments, e.g.: results(dds) or plotMA(dds). In this case the software has to pick a variable to make a results table for, or for making the plot, so it uses the last variable, and the last factor level over the first factor level. However, you can use the 'contrast' or 'name' argument to results() to build any comparison, and the order of variables in the design will not matter.

You can change the design after object construction, but you should not change the design after running DESeq(), because the design is used in the DESeq() steps to run the model. So there isn't a point in building the object with one design and then immediately changing to a different design. And it's not such a good idea, because there are steps during object construction and helpful messages about your design.

An issue with the interaction model is that you don't have replicates to test if genotype influences the response to treatment: you only have one sample per unique combination of genotype x treatment. I'd recommend sticking to questions (i) and (ii) using the 'contrast' argument of results() with a design of ~ genotype + treatment. Or you could increase the replication in the experiment.

ADD COMMENT
0
Entering edit mode

Thank you for your response, especially pointing out the difficulty of determining if there is an interaction if there aren't replicates. If I did have replicates, can you clarify if I would include an interaction term in the "design" argument of the "DESeqDataSetFromHTSeqCount" function, as well as in design(ddsMF) <- formula(~ type + condition + type:condition)? I'm unclear if the "DESeqDataSetFromHTSeqCount" function is literally only for experimental design (with only two variables) or if it should be the model design. If it's the latter, I believe design(ddsMF) <- formula(~ type + condition + type:condition) would then be redundant.

Thanks again,

Katherine

ADD REPLY
0
Entering edit mode

You should supply the design to DESeqDataSetFromHTSeqCount only. You don't need to use "design(dds) <-" at all (it does nothing but swap out the design so if you supply the same twice, it does nothing).

ADD REPLY

Login before adding your answer.

Traffic: 382 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6