Time series experiment analysis in DEseq2 without T=0 and three independent experiments with linked samples
1
0
Entering edit mode
Dorien • 0
@dorien-22736
Last seen 21 months ago

Hello,

I was hoping some of you could help me with the following question about the design formula as input for DEseq2. I am analysing a knock-down experiment with of 3 biological replicates. For each experiment, samples have been taken after selection at 5 different timepoints of cells that were transduced with either control shRNA, or shRNA 1147 or shRNA 1150. The important point here is that the first sample is taken after about 5 days after the cells first started expressing the shRNA: at timepoint 24hrs, there should therefore already be an interesting difference between the samples transduced with control versus the samples transduced with the SH1147 or SH1150.

In addition, for each biological replicate of the experiment, the samples are linked: RNA is taken from the same cells 24hrs, 48hrs, up to 120 hrs. To account for the repeated measures, I have information about which run of the experiment it was (the rep column below). rep1 at 24hrs with SH1147 is derived the same cells as rep1 at 48hrs with SH1147 (but 24hrs later).However, rep1 of SH1150 are obviously different cells. So the rep-number is only within the treatment group.

I am struggling how I should incorporate these features in my design. I have been reading the Bioconductor support for RNAseq and the example analysis and some other troubleshooting I found here but I did not get a full understanding of how to incorporate those two features of my dataset into my design. I am finding that whatever I am doing at the moment, the analysis picks a baseline for each variable I include in the formula. Whereas in my experiment rep1 is not a baseline for rep 2 and 3; and 24 hrs is not a baseline for 48 or 72 hrs.

The design is thus as follows (I've edited the original because the filenames are super long):

sample_name     condition   timepoint   rep
A       SH1147  24  rep1
B       SH1147  24  rep2
C       SH1147  24  rep3
D       SH1147  48  rep1
E       SH1147  48  rep2
F       SH1147  48  rep3
H       SH1150  24  rep1
I       SH1150  24  rep2
J       SH1150  24  rep3
K       SH1150  48  rep1
L       SH1150  48  rep2
M       SH1150  48  rep3
N       SHcontrol   24  rep1
O       SHcontrol   24  rep2
P       SHcontrol   24  rep3
Q       SHcontrol   48  rep1
R       SHcontrol   48  rep2
S       SHcontrol   48  rep3


So, what I would like to be able to compare in the end is the following:

• Compare, for each timepoint, SH1147 to SHcontrol and SH1150 to SHcontrol
• I would like to compare the entire expression profile across the timepoints for SH1147 versus control and SH1150 versus control.
• I would also like to investigate the differences between the two SHrnas versus the control (they target the same gene).

Based on what I have found on a range of help pages and the DEseq2 package help I have given the following design formulas a try but neither of them are giving me the comparisons I need and all of them use rep1 and timepoint24 as a baseline. So I am really missing something here. I am also not sure which formula correctly captures the linkage between my samples within my groups: because this is within groups, I am hesitant to use the first formulas where rep is a seperate variable in the design formula.

According to what I have read the minimum is for the design without the linkage is this

design = ~ timepoint + condition + timepoint:condition
reduced = ~ timepoint


Based on this example in the DEseq2 support pages I think the initial design without the timepoints is this (just looking at the linkage)

design = ~ condition + condition:rep + condition:timepoint


I have tried combining these things in various ways to get a formula that will give me the comparisons I am looking for but I am just not getting it right. Here is an overview of a number of designs I have tried (#4 would be the most direct combination of the two I imagine but also does not work):

#1------------------------------------------------------------------------------------------------------------
dds = DESeqDataSetFromTximport(txi = txi, colData = design, design = ~ timepoint + condition + rep + timepoint:condition + rep:condition + rep:timepoint)

dds <- DESeq(dds, test = "LRT", reduced = ~ rep + rep:timepoint)

> resultsNames(dds)
[1] "Intercept"                        "timepoint_48_vs_24"               "timepoint_72_vs_24"
[4] "timepoint_96_vs_24"               "timepoint_120_vs_24"              "condition_SH1147_vs_SHcontrol378"
[7] "condition_SH1150_vs_SHcontrol378" "rep_2_vs_1"                       "rep_3_vs_1"
[10] "timepoint48.conditionSH1147"      "timepoint72.conditionSH1147"      "timepoint96.conditionSH1147"
[13] "timepoint120.conditionSH1147"     "timepoint48.conditionSH1150"      "timepoint72.conditionSH1150"
[16] "timepoint96.conditionSH1150"      "timepoint120.conditionSH1150"     "conditionSH1147.rep2"
[19] "conditionSH1150.rep2"             "conditionSH1147.rep3"             "conditionSH1150.rep3"
[22] "timepoint48.rep2"                 "timepoint72.rep2"                 "timepoint96.rep2"
[25] "timepoint120.rep2"                "timepoint48.rep3"                 "timepoint72.rep3"
[28] "timepoint96.rep3"                 "timepoint120.rep3"

#2------------------------------------------------------------------------------------------------------------

dds = DESeqDataSetFromTximport(txi = txi, colData = design, design = ~ timepoint + condition + rep + timepoint:condition + rep:condition + rep:timepoint)

dds <- DESeq(dds, test = "LRT", reduced = ~ rep + timepoint + rep:timepoint)

> resultsNames(dds)
[1] "Intercept"                        "timepoint_48_vs_24"               "timepoint_72_vs_24"
[4] "timepoint_96_vs_24"               "timepoint_120_vs_24"              "condition_SH1147_vs_SHcontrol378"
[7] "condition_SH1150_vs_SHcontrol378" "rep_2_vs_1"                       "rep_3_vs_1"
[10] "timepoint48.conditionSH1147"      "timepoint72.conditionSH1147"      "timepoint96.conditionSH1147"
[13] "timepoint120.conditionSH1147"     "timepoint48.conditionSH1150"      "timepoint72.conditionSH1150"
[16] "timepoint96.conditionSH1150"      "timepoint120.conditionSH1150"     "conditionSH1147.rep2"
[19] "conditionSH1150.rep2"             "conditionSH1147.rep3"             "conditionSH1150.rep3"
[22] "timepoint48.rep2"                 "timepoint72.rep2"                 "timepoint96.rep2"
[25] "timepoint120.rep2"                "timepoint48.rep3"                 "timepoint72.rep3"
[28] "timepoint96.rep3"                 "timepoint120.rep3"

#3------------------------------------------------------------------------------------------------------------

dds = DESeqDataSetFromTximport(txi = txi, colData = design, design = ~ condition + condition:rep + condition:timepoint)

dds <- DESeq(dds, test = "LRT", reduced = ~ timepoint)

> resultsNames(dds)
[1] "Intercept"                          "condition_SH1147_vs_SHcontrol378"   "condition_SH1150_vs_SHcontrol378"
[4] "conditionSHcontrol378.rep2"         "conditionSH1147.rep2"               "conditionSH1150.rep2"
[7] "conditionSHcontrol378.rep3"         "conditionSH1147.rep3"               "conditionSH1150.rep3"
[10] "conditionSHcontrol378.timepoint48"  "conditionSH1147.timepoint48"        "conditionSH1150.timepoint48"
[13] "conditionSHcontrol378.timepoint72"  "conditionSH1147.timepoint72"        "conditionSH1150.timepoint72"
[16] "conditionSHcontrol378.timepoint96"  "conditionSH1147.timepoint96"        "conditionSH1150.timepoint96"
[19] "conditionSHcontrol378.timepoint120" "conditionSH1147.timepoint120"       "conditionSH1150.timepoint120"

#4------------------------------------------------------------------------------------------------------------

dds = DESeqDataSetFromTximport(txi = txi, colData = design, design = ~ condition + timepoint + condition:rep + condition:timepoint)

dds <- DESeq(dds, test = "LRT", reduced = ~ timepoint)

> resultsNames(dds)
[1] "Intercept"                        "condition_SH1147_vs_SHcontrol378" "condition_SH1150_vs_SHcontrol378"
[4] "timepoint_48_vs_24"               "timepoint_72_vs_24"               "timepoint_96_vs_24"
[7] "timepoint_120_vs_24"              "conditionSHcontrol378.rep2"       "conditionSH1147.rep2"
[10] "conditionSH1150.rep2"             "conditionSHcontrol378.rep3"       "conditionSH1147.rep3"
[13] "conditionSH1150.rep3"             "conditionSH1147.timepoint48"      "conditionSH1150.timepoint48"
[16] "conditionSH1147.timepoint72"      "conditionSH1150.timepoint72"      "conditionSH1147.timepoint96"
[19] "conditionSH1150.timepoint96"      "conditionSH1147.timepoint120"     "conditionSH1150.timepoint120"

#5------------------------------------------------------------------------------------------------------------
dds = DESeqDataSetFromTximport(txi = txi, colData = design, design = ~ condition + timepoint + condition:rep + condition:timepoint)

dds <- DESeq(dds, test = "LRT", reduced = ~ condition)

> resultsNames(dds)
[1] "Intercept"                        "condition_SH1147_vs_SHcontrol378" "condition_SH1150_vs_SHcontrol378"
[4] "timepoint_48_vs_24"               "timepoint_72_vs_24"               "timepoint_96_vs_24"
[7] "timepoint_120_vs_24"              "conditionSHcontrol378.rep2"       "conditionSH1147.rep2"
[10] "conditionSH1150.rep2"             "conditionSHcontrol378.rep3"       "conditionSH1147.rep3"
[13] "conditionSH1150.rep3"             "conditionSH1147.timepoint48"      "conditionSH1150.timepoint48"
[16] "conditionSH1147.timepoint72"      "conditionSH1150.timepoint72"      "conditionSH1147.timepoint96"
[19] "conditionSH1150.timepoint96"      "conditionSH1147.timepoint120"     "conditionSH1150.timepoint120"


I have also considered analysing the data per condition or per timepoint but I feel like that is also not the way and also prevents me from doing everything I want to do as I described above.

It would be really great if you could help me with this.

Thanks!

Dorien

1
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

In DESeq2, you can't control for variation within the group and compare across groups that contain different levels. E.g you can't control for replicate and compare across condition (e.g. treated vs control at 24 hours controlling for replicate variation present at 24 hours) because those variables are confounded, and it's not possible to estimate those distinctly with fixed effects. The examples in the vignette involve comparisons within individual (e.g. compared within individual to baseline, or comparison of the effect to baseline across groups), but since you have no baseline measurements, those designs won't work for you.

You can instead use limma's duplicateCorrelation() framework to specify the replicate correlations. We don't have functionality like this in DESeq2.

0
Entering edit mode

Thanks a lot for your reply and suggestion! I'll have a look at that!! Does explain why I couldn't make it work.