Question

DESeq2 time series contrasts between timepoints

2

Entering edit mode

gtho123 ▴ 40

@gtho123-8872

Last seen 6.6 years ago

New Zealand

I have a RNA-Seq time series with two conditions (SD and LD) and 3 time points (0, 2, and 4). There are three biological replicates per sample. I have used DESeq2 to do a likelihood ratio test on the following designs:

full ~ condition + time + condition:time
reduced ~ condition + time

I understand that this tests for the effects of the two conditions on the pattern of expression which is what I want. Here are the names of my coefficients:

> resultsNames(dds_test)
[1] "Intercept"          "condition_LD_vs_SD" "time_2_vs_0"        "time_4_vs_0"        "conditionLD.time2"
[6] "conditionLD.time4"

I would also like to contrast the difference between the conditions at the three time points and between time points within each condition. I was hoping someone could confirm my interpretations and answer a question about comparing time points which are not the baseline.

So am I correct in thinking I can test the difference between the conditions at the three timepoints like so:

results(dds_test, name="condition_LD_vs_SD", test="Wald")
results(dds_test, contrast=list(c("condition_LD_vs_SD","conditionLD.time2")), test="Wald")
results(dds_test, contrast=list(c("condition_LD_vs_SD","conditionLD.time4")), test="Wald")

and between time points within each condition like so:

Condition SD

results(dds_test, contrast=list(c("time_2_vs_0")), test="Wald") 
results(dds_test, contrast=list(c("time_4_vs_0")), test="Wald")

Condition LD

results(dds_test, contrast=list(c("time_2_vs_0", "conditionLD.time2")), test="Wald") results(dds_test, contrast=list(c("time_4_vs_0", "conditionLD.time4")), test="Wald")

Is this correct and is there an easy way to calculate and test the fold changes between time points 2 and 4?

This is my sample table:

           sample condition time replicate
LD_ZT4_1 LD_ZT4_1        LD    4         1
LD_ZT4_2 LD_ZT4_2        LD    4         2
LD_ZT4_3 LD_ZT4_3        LD    4         3
SD_ZT4_1 SD_ZT4_1        SD    4         1
SD_ZT4_2 SD_ZT4_2        SD    4         2
SD_ZT4_3 SD_ZT4_3        SD    4         3
SD_ZT0_1 SD_ZT0_1        SD    0         1
SD_ZT0_2 SD_ZT0_2        SD    0         2
SD_ZT0_3 SD_ZT0_3        SD    0         3
SD_ZT2_1 SD_ZT2_1        SD    2         1
SD_ZT2_2 SD_ZT2_2        SD    2         2
SD_ZT2_3 SD_ZT2_3        SD    2         3
LD_ZT0_1 LD_ZT0_1        LD    0         1
LD_ZT0_2 LD_ZT0_2        LD    0         2
LD_ZT0_3 LD_ZT0_3        LD    0         3
LD_ZT2_1 LD_ZT2_1        LD    2         1
LD_ZT2_2 LD_ZT2_2        LD    2         2
LD_ZT2_3 LD_ZT2_3        LD    2         3

rna-seq deseq2 • 6.1k views

ADD COMMENT • link updated 3.1 years ago by AZ ▴ 30 • written 7.1 years ago by gtho123 ▴ 40

score 3 · Answer 1 · 2017-10-01

3

Entering edit mode

Michael Love 42k

@mikelove

Last seen 14 hours ago

United States

I'll jump to the answers. The differences between conditions at each time point are given by the following:

condition_LD_vs_SD - for time point 0
main effect + conditionLD.time2 - for time point 2
main effect + conditionLD.time4 - for time point 4

You can take a look at the diagram of interactions terms in the DESeq2 vignette to see how this is the case.

To build results tables for each of these, you should use:

results(dds, name="condition_LD_vs_SD", test="Wald")

And for the other two you need to do, e.g.:

results(dds, contrast=list(c("condition_LD_vs_SD","conditionLD.time4")), test="Wald")

You can also test between time points for each condition. For the reference level of condition (SD), these are given by the main effect terms. For condition LD, you add the main effect term and the relevant interaction term. So time 4 vs time 0 in LD is:

results(dds, contrast=list(c("time_4_vs_0", "conditionLD.time4")), test="Wald")

While for SD it is simply:

results(dds, name="time_4_vs_0", test="Wald")

For making all these results tables, please note that the correction for multiple testing by the results() function is only *within* a results table across genes. Usually only one or a few results tables are generated for an experiment, where we assume the results of all will be reported. If a user wants to make many results table, but then only selectively report gene sets from a subset of the tables, then we recommend that the user perform multiple test correction over all generated p-values. We don't have a function for this last step in DESeq2 but it can be done with R code.

ADD COMMENT • link 7.1 years ago Michael Love 42k

0

Entering edit mode

Note, I updated the answer above.

ADD REPLY • link 7.1 years ago Michael Love 42k

0

Entering edit mode

how to find the difference at time 4 vs 2 in each condition ?

something like this 
results(dds, contrast=list(c("time_4_vs_2", "conditionLD.time4")), test="Wald")

ADD REPLY • link 6.0 years ago dsl89125 • 0

0

Entering edit mode

There won’t be such a coefficient. This is possible but a bit difficult with the interaction design. Better to combine conditon and time variables into one called “group”, see Interaction section of vignette.

ADD REPLY • link 6.0 years ago Michael Love 42k

0

Entering edit mode

Hello Michael! I used the two methods to get the DEG list, namely the "group" and the interaction design. But the results of the two methods are different. the number of DEGs using the interaction design is larger. Is this normal? many thanks for your help.

ADD REPLY • link 3.6 years ago Y. • 0

0

Entering edit mode

Can you plot the LFC? Are they comparable? If they are not highly correlated, I would guess you are not comparing similar things.

ADD REPLY • link 3.6 years ago Michael Love 42k

0

Entering edit mode

Yes, You are correct. I'm not comparing the same thing. Sorry for my negligence. I have another question. Attached is my metadata. The most interesting factors I want to compare are treatment and day, but the genotype and donor also seem to contribute to the variation. I want to ask if I can get the right answer by combining the treatment and day into "group", ignoring the factor of genotype and donor.

PS. It seems that the additional genotype and donor in the interaction design is responsible for the larger DEGs.

Thanks again.

ADD REPLY • link 3.6 years ago Y. • 0

0

Entering edit mode

For choosing an appropriate design, I'd recommend to collaborate with a statistician. I unfortunately only have time here for software questions.

ADD REPLY • link 3.6 years ago Michael Love 42k

0

Entering edit mode

Sorry. I'm in a medical school, and don't know a statistician. Can you link me to some materials?

ADD REPLY • link 3.6 years ago Y. • 0

0

Entering edit mode

My role here is to provide software support for my Bioconductor packages.

I have many other commitments, so I have to protect my time, while still handling the large number of support requests I receive every week.

If you are in a medical school, there is likely someone at your institution you could consult with.