Question

Most sensible approach to a `stage*condition` design in DESeq2

0

Entering edit mode

alephreish • 0

@2fd8f786

Last seen 14 hours ago

Israel

I have the following experimental design:

four developmental stages, effectively a nominal categorical variable
three conditions: two treatments and one control

The biggest source of variation in the data is stage. The two treatments are conceptually close to each other and the genes responding to them are largely the same, although: one of the treatments has a stronger effect (higher fold changes) and both treatments also elicit specific responses.

What I'm mainly interested in is genes responding to both treatments and genes responding to one but not the other, at each stage.

The approach I have accepted is to unite stage and condition in a single variable group and do Wald tests for contrasts treatment1_vs_control and treatment2_vs_control for each stage separately and apply a universal FDR cutoff. It feels though that the more stage×treatment combinations a gene is discovered to be differentially expressed in, the higher the chances of eventual sub-significant tests to be true discoveries in other combinations as well, which is something I do not take into account. For example, the following gene:

enter image description here

passes my FDR cutoff of 0.05 (the red asterisks) at two stages in treatment1 (blue) and at all stages in treatment2 (yellow), but the sub-significant test for stage C (raw P ~ 0.002) turns out not be significant although the totality of the evidence points to it being differentially expressed here as well albeit with a lower fold change.

My ad hoc idea of dealing with this, is to use two FDRs, say 0.1 and 0.05 in a two-step procedure: if a gene is discovered to be DE in more than some pre-defined number of stage×treatment combinations under the more relaxed cutoff, it is considered to be DE in all of them, otherwise the more stringent cutoff is used.

Is my intuition wrong, is there a better approach to this idea or to my design in general? Thanks!

DESeq2 • 223 views

ADD COMMENT • link 2 days ago • updated 1 day ago alephreish • 0

0

Entering edit mode

Can your developmental stages be defines in terms of a continous time measure (like day 1-3-5 / hour 1-3-5?)? If so, you could also fit a linear (or non-linear) model within deseq for an overview of genes that overall change throughout your developmental curve. You won't get a granular A/B/C/D resolution, but I have a feeling in most cases you care about progression over time, so maybe this will work for you:

dds <- DESeqDataSetFromMatrix(countData = counts,
                                  colData = coldata,
                                  design = ~ treatment + scaled_dev_stage)    # Stage has to be numeric and scaled


keep <- rowSums(counts(dds) >= 10) >= 3
dds <- dds [keep,]


dds <-  DESeq(dds , test="LRT", reduced = ~ treatment)    # For genes that show response in both, testing against no change over time
res <- results(dds)

And for genes that show changes in one group and not the other, you could subset the treatment data as 2 df, run the same test separately with:

design  = ~scaled_dev_stage)

...test = "LRT", reduced = ~1)

And then compare the significant subsets in both. Hope that helps

ADD REPLY • link 2 days ago georgii.vdovin • 0

0

Entering edit mode

No, the developmental stages are too different from each other. In fact, based on prior knowledge and gene expression pattern, they show a strong clustering A+C and B+D. The gene I've shown in my post is not representative of the general pattern: most of the DE genes also show similar expression pattern uniting A+C and B+D instead.

ADD REPLY • link 1 day ago alephreish • 0

0

Entering edit mode

Your second idea to split data in two for treatment1 and treatment2 and run LRT sounds very interesting. So given my full design ~ stage * treatment, the reduced design would be ~ stage.

ADD REPLY • link 1 day ago alephreish • 0

score 0 · Answer 1 · 2024-07-03

Doing a stage_treatment grouping is not the same as doing stage treatment. You can use stage treatment to compare one subgroup to another like you are doing, but doing so can require a bit of trickiness in the contrasts. I think the bigger virtue of doing stage*treatment designs is in the more complicated questions it lets you ask.

If you just want to compare one subgroup of samples to another, stage_treatment tends to be far more readable.

If you want to do essentially a 4 group comparison (like what genes change differently in stages A and B when exposed to treatment1), then you should do stage*treatment. This is basically the equivalent of doing stage_treatment for both stages, subtracting the fold changes (which you could do yourself, of course), but the DESeq will give you the proper p-value, and that you can't DIY.

https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions