Question

Appropriate design formula for DESeq2 from principles

0

Entering edit mode

galen.seilis • 0

@galenseilis-19612

Last seen 18 months ago

Canada

Background

I am analyzing RNA-seq reads to look at transcriptomic response to environmental stress, and I want to ensure that I am using the correct design formula for my experimental design. I have a condition factor with a control level and four other levels in no particular order (CL, A, B, C, D), biological sampling was done in triplicate, and there is a timepoint factor with four ordered levels (T1, T2, T3, T4).

My experimental questions

These are the types of questions I am interested in answering:

For each non-control-level condition, accounting for timepoints, are they different from the control level?
For each condition, including the control level, are their timepoints different in expression level?
All else being equal, which genes were differentially expressed with respect to the reference condition?
Accounting for timepoint, are arbitrary pairs (C vs A) of conditions different from each other?

What I've tried

I've read the DESeq2 paper, the vignette, and rummaged through various post on BioStars and Bioconductor forums. I've learned a lot from that in regard to the DESeq2 package and the mathematics it performs, but it is still unclear to me how to make the design formula that answers my questions. I've followed a tutorial on design formulae in R in general, but it did not clarify ordering of terms.

Related Questions

What are the rules for ordering the terms in a design formula for DESeq2? (What would does ~ A + B vs ~ B + A mean?) I'd like a description of 'the general case' rather than special cases. I'm not a programming or math phobe, so lay it on me.

With the contrasts argument in the results function, how do I similarly make comparisons that are conditioned by other factors?

deseq2 • 878 views

ADD COMMENT • link updated 5.2 years ago by Michael Love 42k • written 5.2 years ago by galen.seilis • 0

1

Entering edit mode

This isn't an answer to your main question, but for an in-depth discussion of how design matrices are constructed from factors, you should have a read through the vignette for the codingMatrices package. You generally won't be working directly with the design matrix in DESeq2, but it's still useful to understand the principles.

ADD REPLY • link 5.2 years ago Ryan C. Thompson ★ 7.9k

score 0 · Answer 1 · 2019-02-12

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 6 hours ago

San Diego

AFAIK, the ordering of terms doesn't matter if you specify the contrast you desire in your results statement.

ADD COMMENT • link 5.2 years ago swbarnes2 ★ 1.4k

score 0 · Answer 2 · 2019-02-13

Take a look at the time series example in the DESeq2 workflow. You can compare conditions at time 0 or at individual time points with the code there. I would suggest, although it is tempting to answer every question with a statistical test, there are perhaps better or more creative ways to find general patterns when you have this kind of data, by clustering e.g. shrunken LFCs of each condition over control over the time periods etc. I don't have any specific R code for you to do the clustering, but I've found the shrunken LFCs provided by DESeq2 to be useful for exploratory downstream analyses like gene clustering in the past.