DESeq2 interaction design with factor containing multiple levels
1
0
Entering edit mode
Nick F • 0
@nick-f-24379
Last seen 12 months ago
United States

I have grown diseased cells (with 3 different genotypes, type A, B, and C) and healthy cells (WT) and put them in two different culture dimensionalities, 2D and 3D culture. I then have 8 groups: 2D healthy, 3D healthy, 2D diseased (of genotype A, B, C) and 3D diseased (of genotype A, B, C). Additionally, the samples are derived from two experimental iterations (E1 and E2), performed at different times.

The experimental design then looks like this:

> coldata
   SampleName Experiment condition genotype
1   E1_2D_A_1         E1        2D        A
2   E1_2D_A_2         E1        2D        A
3   E1_3D_A_1         E1        3D        A
4   E1_3D_A_2         E1        3D        A
5   E2_2D_A_1         E2        2D        A
6   E2_2D_A_2         E2        2D        A
7   E2_3D_A_1         E2        3D        A
8   E2_3D_A_2         E2        3D        A
9   E1_2D_B_1         E1        2D        B
10  E1_3D_B_1         E1        3D        B
11  E1_3D_B_2         E1        3D        B
12  E2_2D_B_1         E2        2D        B
13  E2_2D_B_2         E2        2D        B
14  E2_3D_B_1         E2        3D        B
15  E2_3D_B_2         E2        3D        B
16  E1_2D_C_1         E1        2D        C
17  E1_2D_C_1         E1        2D        C
18  E1_3D_C_1         E1        3D        C
19  E1_3D_C_2         E1        3D        C
20  E1_3D_C_1         E1        3D        C
21  E1_3D_C_2         E1        3D        C
22  E2_2D_C_1         E2        2D        C
23  E2_2D_C_1         E2        2D        C
24  E2_2D_C_2         E2        2D        C
25  E2_3D_C_1         E2        3D        C
26  E2_3D_C_2         E2        3D        C
27  E2_3D_C_1         E2        3D        C
28  E2_3D_C_2         E2        3D        C
29 E1_2D_WT_1         E1        2D       WT
30 E1_3D_WT_1         E1        3D       WT
31 E1_3D_WT_2         E1        3D       WT
32 E2_2D_WT_1         E2        2D       WT
33 E2_2D_WT_2         E2        2D       WT
34 E2_2D_WT_1         E2        2D       WT
35 E2_2D_WT_2         E2        2D       WT
36 E2_3D_WT_1         E2        3D       WT
37 E2_3D_WT_2         E2        3D       WT
38 E2_3D_WT_3         E2        3D       WT
39 E2_3D_WT_1         E2        3D       WT
40 E2_3D_WT_2         E2        3D       WT

First I did comparisons within each culture condition by grouping condition and genotype and using lfcshrink/ashr. For instance for genotype A:

dds <- DESeqDataSetFromMatrix(countData = countdata,
                              colData = coldata,
                              design = ~ Experiment + condition + genotype)

dds$group <- factor(paste0(dds$genotype, dds$condition))
design(dds) <- ~ group
dds <- DESeq(dds)

#2D contrast
A_2D_v_WT_2D_lfcShrink <- lfcShrink(dds, contrast=c("group","A2D","WT2D"), type = "ashr", alpha = 0.05)
#3D contrast
A_2D_v_WT_2D_lfcShrink <- lfcShrink(dds, contrast=c("group","A3D","WT3D"), type = "ashr", alpha = 0.05)

I then ran GSEA observed that some disease-related pathways in 3D have much smaller p-values than the 2D comparison. I looked deeper into why this might be by looking at genes' log2FCs that represent those pathways. Turns out that some of the log2FCs are larger in the 3D comparison than the 2D one. The question is now: for a given gene, how do I determine if the log2FC in the 3D comparison is statistically larger than the log2FC in the 2D comparison?

After reading the vignette and exploring other people's questions on support sites, I think it is possible to do this in DESeq2, namely to compare two contrasts. In my case, between 3D and 2D within the same genotype, normalized to their respective WT groups by use of a design like design = ~ Experiment + condition + genotype + condition:genotype

However, I would like verification that this is the right way to go to answer the question I have. If so, I need help on how to go from the design into making the contrast successfully since I am a novice. I have tried to follow the "Interactions" section in the 10.13.2020 DESeq2 guide and ?results, but unsure how to confirm the function it is making the comparisons I think it is making. Explicitly, how to make sure the contrast is using the correct reference level (either WT in 3D, or WT in 2D for A in 3D and A in 2D, respectively).

Thank you in advance for your time and help.

deseq2 • 1.4k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

Yes, the way to ask if a LFC (say associated with condition) is larger in one group than another (say stratified by a variable group) is: ~group + condition + group:condition. If the interaction effect size is ~0 this means the LFC are similar, if it is larger than zero this means that the LFC is larger in the non-reference group, etc.

If you want to check your intuition, look at genes with a significant interaction term and then look at the plotCounts for those genes.

Re: what reference level is used, this is based on levels(dds$condition) which the user has control over (see vignette note on factor levels), or you can also just look at resultsNames(dds) to see what the reference levels were set to.

ADD COMMENT
0
Entering edit mode

Hi Michael,

Thank you for the quick response! I am still having issues with this though.

First, if I include the factor 'group' made by

    dds$group <- factor(paste0(dds$genotype, dds$condition))
    design(dds) <- ~ group

into my coldata() then run

dds <- DESeqDataSetFromMatrix(countData = countdata,
                              colData = coldata,
                              design = ~ Experiment + group + condition + genotype + condition:genotype)

I get the error

Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')

But if I first exclude that factor group from input coldata(), then run

dds$group <- factor(paste0(dds$genotype, dds$condition))
design(dds) <- ~ group

I do not get an error message, but consequently now group makes useless terms for resultNames() and now cannot help me answer my question. For example doing a contrast (A in 3D vs WT in 3D) vs (A in 2D vs WT in 2D).

Using dds$condition <- relevel(dds$genotype, ref = "WT") doesn't change my resultNames() output. If I can get the level WT in my factor genotype to be the reference, that would suit both comparisons.

However, this would not work for the factor condition, since it needs to be dynamic depending on the contrast. Taking the example above again: (A in 3D vs WT in 3D) vs (A in 2D vs WT in 2D), the reference level would need to be 3D in the first part, but then 2D in the second part.

Thank you!

ADD REPLY
0
Entering edit mode

So I gave some general direction, but you may need to work with a local statistician to choose an appropriate design for your experiment. Unfortunately, I don't have time to provide general statistical consultation for users on the support site, but have to restrict myself to DESeq2 specific software questions.

ADD REPLY
0
Entering edit mode

Sounds good, I understand. Thank you for the time you have provided, it is much appreciated. Take care.

ADD REPLY

Login before adding your answer.

Traffic: 918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6