Appropriate design formula for DESeq2 from principles
2
0
Entering edit mode
@galenseilis-19612
Last seen 2.1 years ago
Canada

Background

I am analyzing RNA-seq reads to look at transcriptomic response to environmental stress, and I want to ensure that I am using the correct design formula for my experimental design. I have a condition factor with a control level and four other levels in no particular order (CL, A, B, C, D), biological sampling was done in triplicate, and there is a timepoint factor with four ordered levels (T1, T2, T3, T4).

My experimental questions

These are the types of questions I am interested in answering:

  1. For each non-control-level condition, accounting for timepoints, are they different from the control level?
  2. For each condition, including the control level, are their timepoints different in expression level?
  3. All else being equal, which genes were differentially expressed with respect to the reference condition?
  4. Accounting for timepoint, are arbitrary pairs (C vs A) of conditions different from each other?

What I've tried

I've read the DESeq2 paper, the vignette, and rummaged through various post on BioStars and Bioconductor forums. I've learned a lot from that in regard to the DESeq2 package and the mathematics it performs, but it is still unclear to me how to make the design formula that answers my questions. I've followed a tutorial on design formulae in R in general, but it did not clarify ordering of terms.

Related Questions

What are the rules for ordering the terms in a design formula for DESeq2? (What would does ~ A + B vs ~ B + A mean?) I'd like a description of 'the general case' rather than special cases. I'm not a programming or math phobe, so lay it on me.

With the contrasts argument in the results function, how do I similarly make comparisons that are conditioned by other factors?

deseq2 • 1.1k views
ADD COMMENT
1
Entering edit mode

This isn't an answer to your main question, but for an in-depth discussion of how design matrices are constructed from factors, you should have a read through the vignette for the codingMatrices package. You generally won't be working directly with the design matrix in DESeq2, but it's still useful to understand the principles.

ADD REPLY
0
Entering edit mode
swbarnes2 ★ 1.4k
@swbarnes2-14086
Last seen 5 days ago
San Diego

AFAIK, the ordering of terms doesn't matter if you specify the contrast you desire in your results statement.

ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 6 days ago
United States

Take a look at the time series example in the DESeq2 workflow. You can compare conditions at time 0 or at individual time points with the code there. I would suggest, although it is tempting to answer every question with a statistical test, there are perhaps better or more creative ways to find general patterns when you have this kind of data, by clustering e.g. shrunken LFCs of each condition over control over the time periods etc. I don't have any specific R code for you to do the clustering, but I've found the shrunken LFCs provided by DESeq2 to be useful for exploratory downstream analyses like gene clustering in the past.

ADD COMMENT

Login before adding your answer.

Traffic: 687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6