Question

DESeq2: multiple genotypes time course

1

Entering edit mode

Matteo Perino ▴ 10

@matteo-perino-14913

Last seen 6.3 years ago

Radboud University

Hallo,

I have situation pretty similar to C: DESeq2 likelihood ratio test (LRT) design - 2 genotypes, 4 time points, with the only difference of having 3 genotypes (WT, 2 different KO) and 3 time points (Day0, day4, day7)

>experimental_design

                           line  day replicate
WT_day0_A       WT day0   A
WT_day0_B       WT day0   B
WT_day0_C       WT day0   C
WT_day4_A       WT day4   A
WT_day4_B       WT day4   B
WT_day4_C       WT day4   C
WT_day7_A       WT day7   A
WT_day7_B       WT day7   B
WT_day7_C       WT day7   C
KO_A_day0_A   KO_A day0   A
KO_A_day0_B   KO_A day0   B
KO_A_day0_C   KO_A day0   C
KO_A_day4_A   KO_A day4   A
KO_A_day4_B   KO_A day4   B
KO_A_day4_C   KO_A day4   C
KO_A_day7_A   KO_A day7   A
KO_A_day7_B   KO_A day7   B
KO_A_day7_C   KO_A day7   C
KO_B_day0_A   KO_B day0   A
KO_B_day0_B   KO_B day0   B
KO_B_day0_C   KO_B day0   C
KO_B_day4_A   KO_B day4   A
KO_B_day4_B   KO_B day4   B
KO_B_day7_A   KO_B day7   A
KO_B_day7_B   KO_B day7   B
KO_B_day7_C   KO_B day7   C

I'm interested in :

1) the DEGs among the different genotypes at any given time point

2) the genes whose temporal expression is perturbed by the genotype

for point 1)

As explained in the linked post and the vignette i can use

dds$group <- factor(paste0(dds$line, dds$day))

and then use the contrast parameter in the results function

WaldresultsA <- results(dgeWALD, contrast=c("group", "KO_A_day0", "WT_day0"))

WaldresultsB <- results(dgeWALD, contrast=c("group", "KO_B_day0", "WT_day0"))

WaldresultsAB <- results(dgeWALD, contrast=c("group", "KO_A_day0", "KO_B_day0"))

and I would repeat it for all the time points.

Is there a faster way that I'm missing to avoid doing 9 pairwise comparison independently?

for point 2)

I'm using a LRT test as suggested for time point analysis with full design ~line + day + line:day and a reduced design ~line + day that should identify genes with an altered expression pattern across time. So far so good.

I did so with:

ddsMat <- DESeqDataSetFromMatrix(countData = dataset, colData = experimental_design, design = ~line+day+line:day)

ddsMatLRT <- DESeq(ddsMat, test = "LRT", reduced = ~line+day)

res_LRT_IHW <- results(ddsMatLRT, filterFun=ihw)

What I'm unsure about is how interpret the results:

> head(res_LRT_IHW)

log2 fold change (MLE): lineKO_B.dayday7
LRT p-value: '~ line + day + line:day' vs '~ line + day'
DataFrame with 6 rows and 7 columns
                       baseMean log2FoldChange     lfcSE       stat
                      <numeric>      <numeric> <numeric>  <numeric>
ENSMUSG00000051951.5 22.5165766     -1.6236271  0.975098 39.1489484
ENSMUSG00000103377.1  0.8807975     -3.7877553  3.469949  3.4807201
ENSMUSG00000104017.1  0.5717264     -1.2395309  4.924453  0.6416225
ENSMUSG00000103201.1  0.7480610     -1.7446142  8.775683  1.7322564
ENSMUSG00000103161.1  2.0269348     -1.8764214  2.680503  5.6762745
ENSMUSG00000102331.1  5.0319818      0.2647875  2.085465  2.2124272
                           pvalue         padj    weight
                        <numeric>    <numeric> <numeric>
ENSMUSG00000051951.5 6.489969e-08 3.847244e-07 1.7927481
ENSMUSG00000103377.1 4.808159e-01 1.000000e+00 0.0000000
ENSMUSG00000104017.1 9.583281e-01 1.000000e+00 0.0000000
ENSMUSG00000103201.1 7.848502e-01 1.000000e+00 0.0000000
ENSMUSG00000103161.1 2.246638e-01 4.802901e-01 0.8586256
ENSMUSG00000102331.1 6.967548e-01 6.594048e-01 1.7711035

my questions are:

what does this line means?

log2 fold change (MLE): lineKO_B.dayday7

reported fold change only goes for line KO_B at day 7? and compared to? KO_B day4, KO_B day 0 or WT day 7? Does it apply to pvalues too?

How do I know which KO line has the deregulated expression that triggers a low pvalue?

How do I know if this difference was detected in one or more time points?

How do I find which one they are?

Does it need to be done one comparison at a time using the contrast option? how? (I hope not, my idea was exactly to avoid performing an endless list of pairwise comparisons..)

also, I had a look at this

> resultsNames(ddsMatLRT)
[1] "Intercept"          "line_KO_A_vs_WT"  "line_KO_B_vs_WT"
[4] "day_day4_vs_day0"   "day_day7_vs_day0"   "lineKO_A.dayday4"
[7] "lineKO_B.dayday4" "lineKO_A.dayday7"  "lineKO_B.dayday7"

isn't it missing a day7 vs day4 comparison? I guess not, so most probably I am the one missing something..

I understand that these are probably questions arising from a lack of understanding of how the results() function works (at least). I tried to follow the vignette, the workflow that is linked in it at the section for time courses, and answers to other questions but I can't wrap my head around it..

PS

For a comparison between the two KO, should I relevel the design on one of them or the comparison is already in the generated table?

Thanks a LOT!

Matteo

deseq2 • 1.3k views

ADD COMMENT • link updated 6.3 years ago by Michael Love 42k • written 6.3 years ago by Matteo Perino ▴ 10

score 1 · Answer 1 · 2018-02-01

Point 1: this code is correct. We don’t have a way to automate many comparisons.

Regarding the LFC following a LRT, check the frequently asked questions in the vignette.

You may benefit from asking some of these questions to a statistical collaborator about how LRT procedures work. It doesn’t single out anything like which time point is/are different.

To compare the two KO, you should use contrast. There is an example in ?results showing this.