Question

Whether to build DESeq2 model with all data and then contrast groups or subset groups first, build model and then contrast?

0

Entering edit mode

sean.mccutcheon • 0

@b6186e5d

Last seen 9 months ago

United States

I want to do differential gene analyses between the following groups (each with three donors).

Control, Treatment 1, Treatment 2, Treatment 3, Treatment 4, Treatment 5.

Metadata

The goal is to identify DEGs between each treatment and the control. I initially built the dds object with the raw counts from all groups and then specified the pairwise comparisons of interest. However, I soon realized that the number of DEGs for each pairwise comparison is greatly influenced by whether I construct the DESeq2 model with all the data or a subset of groups that are then compared. For some comparisons, building the model with all the data seems to improves the power, whereas other comparisons benefit from first subsetting the specific groups and then building the model.

For example, if I subset the data corresponding to control, treatment 1, and treatment 2, construct the dds model, and then specify the pairwise comparisons, there are 280 and 2,193 DEGS for treatment 1 and treatment 2 relative to the control. However, the number of DEGs is reduced to 66 and 125 if I construct the model with all the data and then specify the same comparisons.

This scenario is flipped for other treatments (e.g there are more DEGs when constructing the model with all the data compared to subsetting first).

I am curious what is the best way to approach this problem. Thanks!

DifferentialExpression DESeq2 • 1.5k views

ADD COMMENT • link written 10 months ago by sean.mccutcheon • 0

score 1 · Answer 1 · 2023-06-15

1

Entering edit mode

ATpoint ★ 4.0k

@atpoint-13662

Last seen 21 hours ago

Germany

Please see the FAQ: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#if-i-have-multiple-groups-should-i-run-all-together-or-split-into-pairs-of-groups

ADD COMMENT • link 10 months ago ATpoint ★ 4.0k

score 0 · Answer 2 · 2023-06-23

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 16 hours ago

United States

Can you post a PCA plot? This will help inform how you set up the analysis.

ADD COMMENT • link 10 months ago Michael Love 41k

0

Entering edit mode

Hi Mike, I have posted the PCA plots.

It is interesting. If I subset the untreated samples with single treatments B, Z, or G, PCA1 is explained by the donor, while PCA2 is influenced by the specific treatment. I have routinely seen this before in other experiments with these cells. However, the variance for the dual treatments (BZ or BG; where BZ indicates treatment with both B and Z) is explained predominantly by the treatment rather than the donor especially for BG.

I noticed that donor 2 (treatment 3) in the bottom left plot is far away from all other samples. This is also reflected in the PCA plot with all samples. I wonder if this sample is increasing the variance of the dataset and thus reducing the power for some comparisons.

I am curious what you think is the best approach for this. Thanks!

I have set up the design for the dds object accordingly:

dds = DESeqDataSetFromMatrix(countData = counts, colData = metadata, 
                         design=~condition + donor, tidy = TRUE)

PCA plot

ADD REPLY • link 10 months ago sean.mccutcheon • 0

0

Entering edit mode

What is the relationship btwn donor X across treatments? Is donor 1 in treatment 1 the same as donor 1 in treatment 2?

ADD REPLY • link 9 months ago Michael Love 41k

0

Entering edit mode

Yes. d1, d2, and d3 are the same donors across all treatments.

ADD REPLY • link 9 months ago sean.mccutcheon • 0

0

Entering edit mode

Looking at the PCA, it seems like in the top row, treatment effect >> donor effect or variability, so those would benefit the most by subsetting. The bottom row BZ and BG have more substantial variability. I would recommend subsetting for all the comparisons on this dataset, to avoid losing power on the treatments with strong effects. BZ and/or BG may require more replicates to assess the effect of treatment.