Search
Question: DESeq2 Model Design
2
3.5 years ago by
United Kingdom
andrew.j.skelton73310 wrote:

Hi,

I've done a model design and I hope someone can help out with my understanding of it!

I have an experimental setup that looks something like this:

3 Time points (0hrs, 6hrs, 12hrs)

3 Different Conditions (Treatments A, B and C)

So that makes 9 different combinations of time points and treatments, each is in triplicate. There are therefore27 Samples

My design formula is : ~ Treatment + TimePoint + Treatment:TimePoint

My current understanding is this will give me small pValues of Treatment-specific effects over time?

I wanted to further refine this and look at a specific treatment and how it differs between two time points. So I used the following line:

foo <- list(c(“TreatmentA.TimePoint12hrs"), c(“TreatmentA.TimePoint0hrs"))

resMFType <- results(dds, contrast=foo)

EDIT::

as.data.frame(colData(dds))

TimePoint Treatment
X1        0hr         A
X2        0hr         B
X3        0hr         C
X4        0hr         A
X5        0hr         B
X6        0hr         C
X7        0hr         A
X8        0hr         B
X9        0hr         C
X10       6hr         A
X11       6hr         B
X12       6hr         C
X13       6hr         A
X14       6hr         B
X15       6hr         C
X16       6hr         A
X17       6hr         B
X18       6hr         C
X19      12hr         A
X20      12hr         B
X21      12hr         C
X22      12hr         A
X23      12hr         B
X24      12hr         C
X25      12hr         A
X26      12hr         B
X27      12hr         C

sessionInfo()

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[5] GenomeInfoDb_1.2.3      IRanges_2.0.0           S4Vectors_0.4.0         BiocGenerics_0.12.1

loaded via a namespace (and not attached):
[1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.1 base64enc_0.1-2
[5] BatchJobs_1.5        BBmisc_1.8           Biobase_2.26.0       BiocParallel_1.0.0
[9] brew_1.0-6           checkmate_1.5.0      cluster_1.15.3       codetools_0.2-9
[13] colorspace_1.2-4     DBI_0.3.1            digest_0.6.4         fail_1.2
[17] foreach_1.4.2        foreign_0.8-61       Formula_1.1-2        genefilter_1.48.1
[21] geneplotter_1.44.0   ggplot2_1.0.0        grid_3.1.2           gtable_0.1.2
[25] Hmisc_3.14-5         iterators_1.0.7      lattice_0.20-29      latticeExtra_0.6-26
[29] locfit_1.5-9.1       MASS_7.3-35          munsell_0.4.2        nnet_7.3-8
[33] plyr_1.8.1           proto_0.3-10         RColorBrewer_1.0-5   reshape2_1.4
[37] rpart_4.1-8          RSQLite_1.0.0        scales_0.2.4         sendmailR_1.2-1
[41] splines_3.1.2        stringr_0.6.2        survival_2.37-7      tools_3.1.2
[45] XML_3.98-1.1         xtable_1.7-4         XVector_0.6.0  

Is this correct?

Thanks,

modified 3.5 years ago by James W. MacDonald46k • written 3.5 years ago by andrew.j.skelton73310

hi Andrew,

Just to make sure we give the correct response, can you paste the output of as.data.frame(colData(dds))

and also paste the output of sessionInfo()

Hi Michael, See my edit above, Thanks,

7
3.5 years ago by
Michael Love17k
United States
Michael Love17k wrote:

"My current understanding is this will give me small pValues of Treatment-specific effects over time?"

The way to get a single p-value for each gene for any treatment-specific effects over time is to use the likelihood ratio test:

dds = DESeq(dds, test="LRT", reduced=~Treatment + TimePoint)
results(dds)

Where the results table will give a LRT statistic and p-value. Note that there are many LFCs involved in this likelihood ratio test (all the interaction effects between treatment and time), but the software has to pick just one to print in the table. It will just print the last one based on the ordering of levels. You can change this column with the 'name' argument. But you will notice that the statistic and p-value do not change, because these are based on all the interaction terms.

"I wanted to further refine this and look at a specific treatment and how it differs between two time points. So I used the following line...list(c(“TreatmentA.TimePoint12hrs"), c(“TreatmentA.TimePoint0hrs"))"

That line is correct if you used just DESeq(dds).

If you used the LRT instead, then the interaction terms for base levels are absorbed by the intercept. This means the Treatment A comparison would be:

results(dds, name="TimePoint_12hr_vs_0hr", test="Wald")

...where specifying Wald means that we want p-values for this LFC only.

and for Treatment B this would be:

results(dds, contrast=list(c("TimePoint_12hr_vs_0hr","TreatmentB.TimePoint12hr")), test="Wald")

...which is the sum of the 12 vs 0 effect for the base level (treatment A) and the interaction effect of 12 vs 0 in treatment B.

Using DESeq(...,test="LRT") or just DESeq() are equally valid. The first is better to run if you want to test for any effect, and then make individual comparisons using results(). The second is perhaps easier syntax for comparing individual timepoints.

That is a very comprehensive response. Thank you very much for such a detailed explanation, that helps clear things up for me!

Hi Michael,

A follow up question - if you don't mind

What would be the best way to design a contrast (or LRT), to look at the differences between Treatment A and Treatment B at a specific time point?

i.e. TreatmentB.0hrs-TreatmentA.0hrs

thanks,

2

The treatment B vs A effect at a specific time, say 12 hr, is the main effect (the treatment B vs A effect for the base level of time) and the additional effect at that specific time:

results(dds, contrast=list(c("Treatment_B_vs_A","TreatmentB.TimePoint12hr")), test="Wald")

except for time = 0, which is the base level, and so the treatment B vs A effect is just:

results(dds, name="Treatment_B_vs_A", test="Wald")

That's great, cheers Michael. Slight followup to all of this - is there a reason I can't create a contrast to look between 6hrs and 12hrs? - I can create contrasts to look at 12hrs vs 0hrs and 6hrs vs 0hrs, but not 12hrs vs 6hrs...

1

If you want to contrast two non-base levels, you need to put one level in the numerator, and one level in the denominator (see the contrast argument description in ?results for more information and the Examples section of ?results).

For treatment A, you compare 12 vs 6 like so:

results(dds, contrast=list("TimePoint_12hr_vs_0hr","TimePoint_6hr_vs_0hr"), test="Wald")

For treatment B, you need to also add the interaction effect for B at time 12 and for B at time 6:

results(dds, contrast=list(c("TimePoint_12hr_vs_0hr","TreatmentB.TimePoint12hr"),
c("TimePoint_6hr_vs_0hr","TreatmentB.TimePoint6hr")), test="Wald")

Hi Michael, yet another question if you've got time! Is there a way to explore effects crossing between both time and treatment. For example, if I wanted to see the effects of 0hr TreatmentA Vs 12hr TreatmentC

I thought it'd be something similar to:

results(dds, contrast=list(c("TimePoint_12hr_vs_0hr"),
c("TimePoint_12hr_vs_0hr","TreatmentC.TimePoint12hr")), test="Wald")

This doesn't appear to play ball, and I'm not 100% sure I'm doing the right thing. My thinking is that this should show the interaction effect of TreatmentA 0Hrs Vs TreatmentA 12Hrs Against TreatmentC 0Hrs Vs TreatmentC 12Hrs

1

It doesn't make sense to have an effect in the numerator and the denominator of the fold change, as it will cancel, e.g.

A * B / A

Comparing 12hr C vs 0hr A is to simply add the log fold changes: 12hr vs 0hr, C vs A, and the interaction of C and 12hr.

So something like:

results(dds, contrast=list(c("time_12_vs_0","treat_C_vs_A","treat_C.time_12")))

Note that i switched the order from your question (you had "0hr TreatmentA Vs 12hr TreatmentC "), because the fold changes are relative to the reference levels, so it makes more sense usually to compare 12 vs 0, C vs A, etc.

Ah that makes a lot more sense.

Out of curiosity, going back to the beginning of the experiment. If I had paired samples, what would I have to add to the model design to cover the pairings? Or would I have to change the nature of the analysis from scratch?

I've tried:

~ Pairing + Treatment + TimePoint + Treatment:TimePoint

Does that makes sense?

yes, that's the recommended way to include pair information

Hi Michael and Andrew, I have a question that I was puzzling. To look at differences between two different time points for a specific treatment, using the reduced model as: reduced=~Treatment + TimePoint. To look at the differences between Treatment A and Treatment B at a specific time point, does the reduced model need to change into: reduced=~ TimePoint+ Treatment? And how about the design formula, is it still : ~ Treatment + TimePoint + Treatment:TimePoint

Emily

Remember, the likelihood ratio test is testing *all* time points. This is not the right way to compare specific effects in DESeq2.

If you want to compare specific time points within a treatment or the treatment differences within a single time point you can use, e.g.:

results(dds, test="Wald", contrast=list("treatmentC.time6hr", "treatmentB.time6hr"))

Note that you can run this Wald test on the output of DESeq(dds, test="LRT"). So you can build and LRT results table for all time points, and then test specific comparisons using the code above. The names used in the contrast should come from resultsNames(dds)

Thanks Michael,

So you mean that I do not need to change the full modle or reduced modle when I try to compare specific time points within a treatment or the treatment differences within a single time point, and the precedence order of the "Treatment" and "TimePoint" in the full or reduced modle will not affact the final results?

Yes.

Thank you very much, Michael.

Cheers!