Question

combining columns for experimental design in DESeq2

0

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 3 months ago

Germany

I have a dataset with several samples from two different days (D0 and D3), different cell groups (high and low) and two conditions (WT, KO). All are in duplications (see table below)

    names   day     condition   differentiated  rep
1   D0_MAEAKO_High_1    D0  KO  high    1
2   D0_MAEAKO_High_2    D0  KO  high    2
3   D0_MAEAKO_Low_1     D0  KO  low     1
4   D0_MAEAKO_Low_2     D0  KO  low     2
5   D0_WT_High_1    D0  WT  high    1
6   D0_WT_High_2    D0  WT  high    2
7   D0_WT_Low_1     D0  WT  low     1
8   D0_WT_Low_2     D0  WT  low     2
9   D3_MAEAKO_High_1    D3  KO  high    1
10  D3_MAEAKO_High_2    D3  KO  high    2
11  D3_MAEAKO_Low_1     D3  KO  low     1
12  D3_MAEAKO_Low_2     D3  KO  low     2
13  D3_WT_High_1    D3  WT  high    1
14  D3_WT_High_2    D3  WT  high    2
15  D3_WT_Low_1     D3  WT  low     1
16  D3_WT_Low_2     D3  WT  low     2

In the analysis I would like to separately analyze the two different days and be able to compare within each day group the two low and two high groups against each other as well as the two KO and two WT against each other.

Would it make more sense for that purpose to create a new column in my design matrix, concatenating the columns day, condition and differentiated, getting something like D0_KO_high, D0_KO_low, etc. ?

or would a design parameter of ~ day + condition + differentiated gives me the same results?

thanks, Assa

deseq2 design metadata • 1.3k views

ADD COMMENT • link updated 4.3 years ago by sebastian.lobentanzer ▴ 50 • written 4.3 years ago by Assa Yeroslaviz ★ 1.5k

score 2 · Accepted Answer · 2020-01-08

Hi Assa, no, the two designs are not equivalent. The first one I would use to compare two specific conditions, e.g. if you want to know if the difference between WT and KO on D3, for each of the cell groups. Most often, this is what you want.

With the second one, on the other hand, you will be able to look at one of the parameters across all other conditions. For example: if you want to find genes that are consistently different between WT and KO you would use a contrast of c("condition", "WT", "KO") with your second design. This way, you are "correcting" for the influence of day and differentiation, same as you would for batches, for example. It depends on the expression patterns of your genes what you will find in the end.

You could also go in between, by combining two of the variables to one factor, and leaving the other separate. Lastly, you could also use an interaction design, which is explained nicely in the DESeq2 Vignette, but often this is not necessary for the biological question at hand.