Question: deseq2 multifactor timeseries help
0
gravatar for sukhi17
2.5 years ago by
sukhi170
sukhi170 wrote:

Hi, 

I have been trying to learn deseq2 and managed to work on a two factorial design (followed a similar post DESeq2 Model Design and worked through my analysis, thanks to Michael). I'm now into a three factor time-series design and would need some help please in designing my model and describing contrasts. I have been looking into lots of forum questions but still confused, have also read the vignette.   

Here is my design: Group: A and B, Treatment: control and treated, time: 0,1,2,4 - 3 replicates each

group treatment timepoints
1         A         control         T0
2         A         control         T0
3         A         control         T0
4         B         control         T0
5         B         control         T0
6         B         control         T0
7         A         control         T1
8         A         control         T1
9         A         control         T1
10        B         control         T1
11        B         control         T1
12        B         control         T1
13        A         treated         T1
14        A         treated         T1
15        A         treated         T1
16        B         treated         T1
17        B         treated         T1
18        B         treated         T1
19        A         control         T2
20        A         control         T2
21        A         control         T2
22        B         control         T2
23        B         control         T2
24        B         control         T2
25        A         treated         T2
26        A         treated         T2
27        A         treated         T2
28        B         treated         T2
29        B         treated         T2
30        B         treated         T2
31        A         control         T4
32        A         control         T4
33        A         control         T4
34        B         control         T4
35        B         control         T4
36        B         control         T4
37        A         treated         T4
38        A         treated         T4
39        A         treated         T4
40        B         treated         T4
41        B         treated         T4
42        B         treated         T4

key questions for  DE:

1) time-series DE within a specific group, 2)  time-series DE within specific treatment 3) group vs treatment time series combinations

Different model designs tried:

 ~group+treatment+timepoints+treatment:group

~group+treatment+timepoints+group:timepoints

~group+group:treatment+group:timepoints -  test="LRT", reduced = ~group+group:timepoints

I am having a hard time to understand what design should I use for a complete analysis (I should add in: newbie in R, so learning as I go). 

> sessionInfo()

R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_New Zealand.1252  LC_CTYPE=English_New Zealand.1252   
[3] LC_MONETARY=English_New Zealand.1252 LC_NUMERIC=C                        
[5] LC_TIME=English_New Zealand.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggplot2_2.2.0              DESeq2_1.10.1              RcppArmadillo_0.7.500.0.0 
 [4] Rcpp_0.12.8                SummarizedExperiment_1.0.2 Biobase_2.30.0            
 [7] GenomicRanges_1.22.4       GenomeInfoDb_1.6.3         IRanges_2.4.8             
[10] S4Vectors_0.8.11           BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] genefilter_1.52.1    locfit_1.5-9.1       splines_3.2.2        lattice_0.20-34     
 [5] colorspace_1.3-1     htmltools_0.3.5      survival_2.40-1      XML_3.98-1.5        
 [9] foreign_0.8-67       DBI_0.5-1            BiocParallel_1.4.3   RColorBrewer_1.1-2  
[13] lambda.r_1.1.9       plyr_1.8.4           stringr_1.1.0        zlibbioc_1.16.0     
[17] munsell_0.4.3        gtable_0.2.0         futile.logger_1.4.3  memoise_1.0.0       
[21] latticeExtra_0.6-28  knitr_1.15.1         geneplotter_1.48.0   AnnotationDbi_1.32.3
[25] htmlTable_1.7        acepack_1.4.1        xtable_1.8-2         openssl_0.9.5       
[29] scales_0.4.1         base64_2.0           Hmisc_4.0-1          annotate_1.48.0     
[33] XVector_0.10.0       gridExtra_2.2.1      digest_0.6.10        stringi_1.1.2       
[37] grid_3.2.2           tools_3.2.2          magrittr_1.5         lazyeval_0.2.0      
[41] tibble_1.2           RSQLite_1.1          Formula_1.2-1        cluster_2.0.5       
[45] futile.options_1.0.0 Matrix_1.2-7.1       data.table_1.10.0    assertthat_0.1      
[49] rpart_4.1-10         nnet_7.3-12         

 Thanks everyone for your time!!!

 

 

ADD COMMENTlink modified 2.4 years ago by ana0 • written 2.5 years ago by sukhi170

Dear Michael,

I'm analysing a RNA-seq dataset with a very similar design to the one described by Sukhi, except that I do not have control and infected samples at time 0 (only have control and infected samples at T1, T2, T3).

The approach I took was combining all 3 factors (Group, Treatment, Time) into a single factor and using design=~ Donor + Group_Treatment_Time. Would this be a good strategy?

I'm interested in time-specific differences:
i) between control and infection for each group
ii) between groups for control (or infection)

and I'm interested in finding genes with a difference in baseline expression (a main effect), ie lines moving in parallel:
iii) between control and infection for each group
iv) between groups for control (or infection)

Thank you for your time.

ADD REPLYlink written 2.4 years ago by ana0

I'd recommend you partner with a local statistician as well. You have donor, group, treatment and time variables, and there are many ways to do the modeling. Note that it's not DESeq2 specific, any modeling you can do with linear models in R you can use DESeq2 to do.

ADD REPLYlink written 2.4 years ago by Michael Love25k
Answer: deseq2 multifactor timeseries help
1
gravatar for Michael Love
2.5 years ago by
Michael Love25k
United States
Michael Love25k wrote:

hi Sukhi,

You can take a look at the time series example in the workflow:

http://www.bioconductor.org/help/workflows/rnaseqGene/#time

Supposing you didn't have the "group" variable, a difference from that time series example and your data is that you have no treated samples at time 0. So you cannot estimate a "treatment" effect (this is the difference at time 0 between treated and control). Instead you can create a model matrix with just the interaction terms but no "treatment" term:

mm <- model.matrix(~timepoint + treatment:timepoint)

Then you will have to remove the column of 0's from 'mm' because there are no treated samples at time=0. Let's call that cleaned up matrix 'mm.full'. Then, the test for differences in the treated samples across time would be to subset the interaction terms out of 'mm.full', and provide the smaller matrix to the 'reduced' argument. If we call that smaller matrix 'mm.reduced' it would look like:

dds <- DESeq(dds, full=mm.full, reduced=mm.reduced, test="LRT")
res <- results(dds)

Things are complicated by having the "group" variable. In fact there are a couple of possible designs here, and therefore I would recommend you partner with a local statistician who can help you formulate specific designs and contrasts, and interpret results.

ADD COMMENTlink written 2.5 years ago by Michael Love25k

Hi Michael,

I have relabeled the samples as group_treatment instead of keeping them as two separate factors (group and treatment separately) to answer important biological questions.

> sampleTable
   condition timepoints
1  control_A         T0
2  control_A         T0
3  control_A         T0
4  control_B         T0
5  control_B         T0
6  control_B         T0
7  control_A         T1
8  control_A         T1
9  control_A         T1
10 control_B         T1
11 control_B         T1
12 control_B         T1
13 treated_A         T1
14 treated_A         T1
15 treated_A         T1
16 treated_B         T1
17 treated_B         T1
18 treated_B         T1
19 control_A         T2
20 control_A         T2
21 control_A         T2
22 control_B         T2
23 control_B         T2
24 control_B         T2
25 treated_A         T2
26 treated_A         T2
27 treated_A         T2
28 treated_B         T2
29 treated_B         T2
30 treated_B         T2
31 control_A         T4
32 control_A         T4
33 control_A         T4
34 control_B         T4
35 control_B         T4
36 control_B         T4
37 treated_A         T4
38 treated_A         T4
39 treated_A         T4
40 treated_B         T4
41 treated_B         T4
42 treated_B         T4

Because of missing samples/timepoints, I have removed the column of 0's as suggested and here is the code below:

mm <- model.matrix(~timepoints+condition:timepoints)

 

all.zero <- apply(mm, 2, function(x) all(x == 0))
mm.full <- mm[, !all.zero]

dds <- DESeqDataSetFromMatrix(countData = countdata, colData = sampleTable, design = ~ timepoints+condition)

mm.reduced <- model.matrix(~timepoints+condition)

ddsTC <- DESeq(dds, full = mm.full, reduced = mm.reduced, test="LRT")

> resultsNames(ddsTC)
 [1] "Intercept"                       "timepointsT1"                   
 [3] "timepointsT2"                    "timepointsT4"                   
 [5] "timepointsT0.conditioncontrol_B" "timepointsT1.conditioncontrol_B"
 [7] "timepointsT2.conditioncontrol_B" "timepointsT4.conditioncontrol_B"
 [9] "timepointsT1.conditiontreated_A" "timepointsT2.conditiontreated_A"
[11] "timepointsT4.conditiontreated_A" "timepointsT1.conditiontreated_B"
[13] "timepointsT2.conditiontreated_B" "timepointsT4.conditiontreated_B"

Does this look OK? I am particularly interested in what design to specify at the line

dds <- DESeqDataSetFromMatrix(countData = countdata, colData = sampleTable, design = ~ timepoints+condition). 

For looking at effect in control_B sample between T1 and T0:

results(ddsTC, contrast=list(c("timepointsT1.conditioncontrol_B","timepointsT0.conditioncontrol_B")), test="Wald")

How do I create a contrast (LRT) to look at differences between  control_B vs control_A samples at T0 given that timepointsT0 and timepontsT0.conditioncontrol_A are used as base levels.

Please let me know if I am on the right track. Thanks heaps for looking at my queries. 

ADD REPLYlink written 2.3 years ago by sukhi170

This design doesn't work because the samples at time 0 for treatment B are not being used for comparisons of the later time points. I'm sorry but it's not a simple software question how to approach this analysis, and there are a number of statistical choices to make. I'm going to stick with my recommendation that you approach a biostatistician to work with on this analysis.

ADD REPLYlink written 2.3 years ago by Michael Love25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 310 users visited in the last hour