Question

DEseq2 design with paired individuals at two timepoints, two groups and two responses

0

Entering edit mode

tamsyn.derrick • 0

@tamsynderrick-13020

Last seen 5.5 years ago

Hi,

Please can I check my experimental design? I want to be sure it is appropriate for my data/questions. I am also confused that some apparently quite large differences between groups are not statistically significant.

I have 96 samples in total: 48 samples from baseline and 48 samples at 1 month, from the same (paired) individuals. Of the 48 samples taken at baseline, all 48 were given surgery, then 24 received a drug (group A) and 24 received placebo (group B) (I actually don't know which is the drug and which is the placebo group, but in order to explain I will pretend these are the groups). At 1 month, 12/24 people in group A had outcome 0, and 12/24 of group A had outcome 1. Likewise in group B, 12/24 had outcome 0 and 12/24 had outcome 1.

I want to understand gene expression changes following surgery (ie from baseline to 1 month) in groups A versus B, in outcomes 0 versus 1, and in A_0 versus A_1, B_0 versus B_1.

I have used the post here (DESeq2 design confusion with three factors) and this part of the vignette (http://master.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#nested-indiv) as a guide for my design. I have a minimum of 12 in each group, so I think that is enough to combine the drug group and outcome factors in order to compare A_0 v A_1. So I did three separate analyses: one for A v B, one for 0 v 1, and another for comparisons between the four subgroups (A_0, A_1, B_0, B_1).

To compare A_0 v A_1 and B_0 v B_1, I have combined group and outcome into a single variable (A_0, A_1, B_0, B_1) = "grp". I have added a column called "ind.n" to control for the effect of group being a linear combination of the individuals (I added this column to my sample table, however I have since tried adding it to the dds and reassigning the matrix design and it made no difference to the results). I have the following sample table:

> sampleTable
grp id cnd ind.n
1 A_0 DT0317 T0 1
2 A_0 DT0351 T0 2
3 A_0 DT0381 T0 3
4 A_0 DT0386 T0 4
5 A_0 DT0456 T0 5
6 A_0 DT0496 T0 6
7 A_0 DT0540 T0 7
8 A_0 DT0586 T0 8
9 A_0 DT0703 T0 9
10 A_0 DT0817 T0 10
11 A_0 DT0890 T0 11
12 A_0 DT0960 T0 12
13 A_0 DT0317 T1 1
14 A_0 DT0351 T1 2
15 A_0 DT0381 T1 3
16 A_0 DT0386 T1 4
17 A_0 DT0456 T1 5
18 A_0 DT0496 T1 6
19 A_0 DT0540 T1 7
20 A_0 DT0586 T1 8
21 A_0 DT0703 T1 9
22 A_0 DT0817 T1 10
23 A_0 DT0890 T1 11
24 A_0 DT0960 T1 12
25 A_1 DT0116 T0 1
26 A_1 DT0134 T0 2
27 A_1 DT0157 T0 3
28 A_1 DT0161 T0 4
29 A_1 DT0183 T0 5
30 A_1 DT0190 T0 6
31 A_1 DT0200 T0 7
32 A_1 DT0216 T0 8
33 A_1 DT0335 T0 9
34 A_1 DT0403 T0 10
35 A_1 DT0443 T0 11
36 A_1 DT0452 T0 12
37 A_1 DT0116 T1 1
38 A_1 DT0134 T1 2
39 A_1 DT0161 T1 3
40 A_1 DT0157 T1 4
41 A_1 DT0183 T1 5
42 A_1 DT0190 T1 6
43 A_1 DT0216 T1 7
44 A_1 DT0200 T1 8
45 A_1 DT0335 T1 9
46 A_1 DT0403 T1 10
47 A_1 DT0443 T1 11
48 A_1 DT0452 T1 12
49 B_0 DT0016 T0 1
50 B_0 DT0133 T0 2
51 B_0 DT0318 T0 3
52 B_0 DT0326 T0 4
53 B_0 DT0374 T0 5
54 B_0 DT0405 T0 6
55 B_0 DT0549 T0 7
56 B_0 DT0613 T0 8
57 B_0 DT0625 T0 9
58 B_0 DT0646 T0 10
59 B_0 DT0670 T0 11
60 B_0 DT0987 T0 12
61 B_0 DT0016 T1 1
62 B_0 DT0133 T1 2
63 B_0 DT0318 T1 3
64 B_0 DT0326 T1 4
65 B_0 DT0374 T1 5
66 B_0 DT0405 T1 6
67 B_0 DT0549 T1 7
68 B_0 DT0613 T1 8
69 B_0 DT0625 T1 9
70 B_0 DT0646 T1 10
71 B_0 DT0670 T1 11
72 B_0 DT0987 T1 12
73 B_1 DT0096 T0 1
74 B_1 DT0165 T0 2
75 B_1 DT0184 T0 3
76 B_1 DT0281 T0 4
77 B_1 DT0310 T0 5
78 B_1 DT0370 T0 6
79 B_1 DT0465 T0 7
80 B_1 DT0609 T0 8
81 B_1 DT0689 T0 9
82 B_1 DT0733 T0 10
83 B_1 DT0759 T0 11
84 B_1 DT0962 T0 12
85 B_1 DT0096 T1 1
86 B_1 DT0165 T1 2
87 B_1 DT0184 T1 3
88 B_1 DT0281 T1 4
89 B_1 DT0310 T1 5
90 B_1 DT0370 T1 6
91 B_1 DT0465 T1 7
92 B_1 DT0609 T1 8
93 B_1 DT0689 T1 9
94 B_1 DT0733 T1 10
95 B_1 DT0759 T1 11
96 B_1 DT0962 T1 12

I have the following design:

> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= ~ grp + grp:ind.n + grp:cnd)

Then I can get the expression changes over time for each group:

> res <- results(dds, name="grpA_0.cndT1")

> res <- results(dds, name="grpA_1.cndT1")

> res <- results(dds, name="grpB_0.cndT1")

> res <- results(dds, name="grpB_1.cndT1")

And compare the expression changes over time between groups:

> res<-results(dds, contrast=list("grpA_0.cndT1","grpA_1.cndT1"))

> res<-results(dds, contrast=list("grpB_0.cndT1","grpB_1.cndT1"))

Is this experimental design and the application of "ind.n" appropriate for my analysis? I have followed the same approach for comparisons of grp = A or B, and 0 or 1 by repeating the analyses (with 24 samples in each group).

In the results for "grpA_0.cndT1" I get 328 genes with Padj<0.05 and logFC> +/- 0.32 (equivalent to a FC of 1.25). For "grpA_1.cndT1" I get 1283 genes significant by these same parameters. However when I contrast the two, there are no significantly differentially expressed genes (the lowest Padj is 0.1). To take one gene as an example, IL11, in group A_0 it has a log2FC of 5.75 and Padj of 4.86E-4, and in group A_1 it has a log2FC of 9.52 and a Padj of 2.77E-11. BaseMean is 140.99. In the contrast of A_0 v A_1, IL11 has a log2FC of -3.77 and Padj=0.57. Perhaps there is a lot of variation/dispersion within these groups and that is why the difference in expression of IL11 between A_0 and A_1 is not statistically significant. But given the differences in the numbers of DE genes (328 and 1283), I am a little surprised/disappointed there are no significant differences in the contrast.

Hence I want to double check that my experimental design is correct for my data/questions, to verify that these results are real and not a result of an error in my design. I would be really grateful if you are able to help me.

Many thanks

Tamsyn

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] BiocInstaller_1.26.0 plyr_1.8.4 DESeq2_1.16.1
[4] SummarizedExperiment_1.6.1 DelayedArray_0.2.2 matrixStats_0.52.2
[7] Biobase_2.36.2 GenomicRanges_1.28.2 GenomeInfoDb_1.12.0
[10] IRanges_2.10.1 S4Vectors_0.14.1 BiocGenerics_0.22.0

loaded via a namespace (and not attached):
[1] genefilter_1.58.1 locfit_1.5-9.1 splines_3.4.0
[4] lattice_0.20-35 colorspace_1.3-2 htmltools_0.3.6
[7] base64enc_0.1-3 survival_2.41-3 XML_3.98-1.7
[10] rlang_0.1 DBI_0.6-1 foreign_0.8-68
[13] BiocParallel_1.10.1 RColorBrewer_1.1-2 GenomeInfoDbData_0.99.0
[16] stringr_1.2.0 zlibbioc_1.22.0 munsell_0.4.3
[19] gtable_0.2.0 htmlwidgets_0.8 memoise_1.1.0
[22] latticeExtra_0.6-28 knitr_1.15.1 geneplotter_1.54.0
[25] AnnotationDbi_1.38.0 htmlTable_1.9 Rcpp_0.12.10
[28] readr_1.1.1 acepack_1.4.1 xtable_1.8-2
[31] scales_0.4.1 backports_1.0.5 checkmate_1.8.2
[34] Hmisc_4.0-3 annotate_1.54.0 XVector_0.16.0
[37] gridExtra_2.2.1 hms_0.3 ggplot2_2.2.1
[40] digest_0.6.12 stringi_1.1.5 grid_3.4.0
[43] tools_3.4.0 bitops_1.0-6 magrittr_1.5
[46] RSQLite_1.1-2 lazyeval_0.2.0 RCurl_1.95-4.8
[49] tibble_1.3.1 Formula_1.2-1 cluster_2.0.6
[52] MASS_7.3-47 Matrix_1.2-10 data.table_1.10.4
[55] R6_2.2.1 rpart_4.1-11 nnet_7.3-12
[58] compiler_3.4.0
>

deseq2 multiple factor design multiple time points paired samples • 1.8k views

ADD COMMENT • link updated 6.9 years ago by Michael Love 42k • written 6.9 years ago by tamsyn.derrick • 0

score 0 · Answer 1 · 2017-06-30

hi Tamsyn,

Yes your design and use of the factor variable ind.n looks correct.

Regarding the question of why the contrasts are not significant, while the time comparisons within groups are significant and with large effect size, this can definitely happen. I have a dataset recently where the effects were really large and significant, and the log2FoldChanges not close to each other, but still not significant, because there was a lot of uncertainty in exactly how big the LFC was although clearly no where near 0.