I have grown diseased cells (with 3 different genotypes, type A, B, and C) and healthy cells (WT) and put them in two different culture dimensionalities, 2D and 3D culture. I then have 8 groups: 2D healthy, 3D healthy, 2D diseased (of genotype A, B, C) and 3D diseased (of genotype A, B, C). Additionally, the samples are derived from two experimental iterations (E1 and E2), performed at different times.
The experimental design then looks like this:
> coldata
SampleName Experiment condition genotype
1 E1_2D_A_1 E1 2D A
2 E1_2D_A_2 E1 2D A
3 E1_3D_A_1 E1 3D A
4 E1_3D_A_2 E1 3D A
5 E2_2D_A_1 E2 2D A
6 E2_2D_A_2 E2 2D A
7 E2_3D_A_1 E2 3D A
8 E2_3D_A_2 E2 3D A
9 E1_2D_B_1 E1 2D B
10 E1_3D_B_1 E1 3D B
11 E1_3D_B_2 E1 3D B
12 E2_2D_B_1 E2 2D B
13 E2_2D_B_2 E2 2D B
14 E2_3D_B_1 E2 3D B
15 E2_3D_B_2 E2 3D B
16 E1_2D_C_1 E1 2D C
17 E1_2D_C_1 E1 2D C
18 E1_3D_C_1 E1 3D C
19 E1_3D_C_2 E1 3D C
20 E1_3D_C_1 E1 3D C
21 E1_3D_C_2 E1 3D C
22 E2_2D_C_1 E2 2D C
23 E2_2D_C_1 E2 2D C
24 E2_2D_C_2 E2 2D C
25 E2_3D_C_1 E2 3D C
26 E2_3D_C_2 E2 3D C
27 E2_3D_C_1 E2 3D C
28 E2_3D_C_2 E2 3D C
29 E1_2D_WT_1 E1 2D WT
30 E1_3D_WT_1 E1 3D WT
31 E1_3D_WT_2 E1 3D WT
32 E2_2D_WT_1 E2 2D WT
33 E2_2D_WT_2 E2 2D WT
34 E2_2D_WT_1 E2 2D WT
35 E2_2D_WT_2 E2 2D WT
36 E2_3D_WT_1 E2 3D WT
37 E2_3D_WT_2 E2 3D WT
38 E2_3D_WT_3 E2 3D WT
39 E2_3D_WT_1 E2 3D WT
40 E2_3D_WT_2 E2 3D WT
First I did comparisons within each culture condition by grouping condition and genotype and using lfcshrink/ashr. For instance for genotype A:
dds <- DESeqDataSetFromMatrix(countData = countdata,
colData = coldata,
design = ~ Experiment + condition + genotype)
dds$group <- factor(paste0(dds$genotype, dds$condition))
design(dds) <- ~ group
dds <- DESeq(dds)
#2D contrast
A_2D_v_WT_2D_lfcShrink <- lfcShrink(dds, contrast=c("group","A2D","WT2D"), type = "ashr", alpha = 0.05)
#3D contrast
A_2D_v_WT_2D_lfcShrink <- lfcShrink(dds, contrast=c("group","A3D","WT3D"), type = "ashr", alpha = 0.05)
I then ran GSEA observed that some disease-related pathways in 3D have much smaller p-values than the 2D comparison. I looked deeper into why this might be by looking at genes' log2FCs that represent those pathways. Turns out that some of the log2FCs are larger in the 3D comparison than the 2D one. The question is now: for a given gene, how do I determine if the log2FC in the 3D comparison is statistically larger than the log2FC in the 2D comparison?
After reading the vignette and exploring other people's questions on support sites, I think it is possible to do this in DESeq2, namely to compare two contrasts. In my case, between 3D and 2D within the same genotype, normalized to their respective WT groups by use of a design like design = ~ Experiment + condition + genotype + condition:genotype
However, I would like verification that this is the right way to go to answer the question I have. If so, I need help on how to go from the design into making the contrast successfully since I am a novice. I have tried to follow the "Interactions" section in the 10.13.2020 DESeq2 guide and ?results
, but unsure how to confirm the function it is making the comparisons I think it is making. Explicitly, how to make sure the contrast is using the correct reference level (either WT in 3D, or WT in 2D for A in 3D and A in 2D, respectively).
Thank you in advance for your time and help.
Hi Michael,
Thank you for the quick response! I am still having issues with this though.
First, if I include the factor 'group' made by
into my
coldata()
then runI get the error
But if I first exclude that factor
group
from inputcoldata()
, then runI do not get an error message, but consequently now
group
makes useless terms forresultNames()
and now cannot help me answer my question. For example doing a contrast (A in 3D vs WT in 3D) vs (A in 2D vs WT in 2D).Using
dds$condition <- relevel(dds$genotype, ref = "WT")
doesn't change myresultNames()
output. If I can get the levelWT
in my factorgenotype
to be the reference, that would suit both comparisons.However, this would not work for the factor
condition
, since it needs to be dynamic depending on the contrast. Taking the example above again: (A in 3D vs WT in 3D) vs (A in 2D vs WT in 2D), the reference level would need to be3D
in the first part, but then2D
in the second part.Thank you!
So I gave some general direction, but you may need to work with a local statistician to choose an appropriate design for your experiment. Unfortunately, I don't have time to provide general statistical consultation for users on the support site, but have to restrict myself to DESeq2 specific software questions.
Sounds good, I understand. Thank you for the time you have provided, it is much appreciated. Take care.