DEseq2 design with paired individuals at two timepoints, two groups and two responses
1
0
Entering edit mode
@tamsynderrick-13020
Last seen 6.1 years ago

Hi, 

Please can I check my experimental design? I want to be sure it is appropriate for my data/questions. I am also confused that some apparently quite large differences between groups are not statistically significant. 

I have 96 samples in total: 48 samples from baseline and 48 samples at 1 month, from the same (paired) individuals. Of the 48 samples taken at baseline, all 48 were given surgery, then 24 received a drug (group A) and 24 received placebo (group B) (I actually don't know which is the drug and which is the placebo group, but in order to explain I will pretend these are the groups). At 1 month, 12/24 people in group A had outcome 0, and 12/24 of group A had outcome 1. Likewise in group B, 12/24 had outcome 0 and 12/24 had outcome 1. 

I want to understand gene expression changes following surgery (ie from baseline to 1 month) in groups A versus B, in outcomes 0 versus 1, and in A_0 versus A_1, B_0 versus B_1. 

I have used the post here (DESeq2 design confusion with three factors) and this part of the vignette (http://master.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#nested-indiv) as a guide for my design. I have a minimum of 12 in each group, so I think that is enough to combine the drug group and outcome factors in order to compare A_0 v A_1. So I did three separate analyses: one for A v B, one for 0 v 1, and another for comparisons between the four subgroups (A_0, A_1, B_0, B_1).

To compare A_0 v A_1 and B_0 v B_1, I have combined group and outcome into a single variable (A_0, A_1, B_0, B_1) = "grp". I have added a column called "ind.n" to control for the effect of group being a linear combination of the individuals (I added this column to my sample table, however I have since tried adding it to the dds and reassigning the matrix design and it made no difference to the results). I have the following sample table:

> sampleTable
    grp   id         cnd   ind.n
1  A_0 DT0317  T0     1
2  A_0 DT0351  T0     2
3  A_0 DT0381  T0     3
4  A_0 DT0386  T0     4
5  A_0 DT0456  T0     5
6  A_0 DT0496  T0     6
7  A_0 DT0540  T0     7
8  A_0 DT0586  T0     8
9  A_0 DT0703  T0     9
10 A_0 DT0817  T0    10
11 A_0 DT0890  T0    11
12 A_0 DT0960  T0    12
13 A_0 DT0317  T1     1
14 A_0 DT0351  T1     2
15 A_0 DT0381  T1     3
16 A_0 DT0386  T1     4
17 A_0 DT0456  T1     5
18 A_0 DT0496  T1     6
19 A_0 DT0540  T1     7
20 A_0 DT0586  T1     8
21 A_0 DT0703  T1     9
22 A_0 DT0817  T1    10
23 A_0 DT0890  T1    11
24 A_0 DT0960  T1    12
25 A_1 DT0116  T0     1
26 A_1 DT0134  T0     2
27 A_1 DT0157  T0     3
28 A_1 DT0161  T0     4
29 A_1 DT0183  T0     5
30 A_1 DT0190  T0     6
31 A_1 DT0200  T0     7
32 A_1 DT0216  T0     8
33 A_1 DT0335  T0     9
34 A_1 DT0403  T0    10
35 A_1 DT0443  T0    11
36 A_1 DT0452  T0    12
37 A_1 DT0116  T1     1
38 A_1 DT0134  T1     2
39 A_1 DT0161  T1     3
40 A_1 DT0157  T1     4
41 A_1 DT0183  T1     5
42 A_1 DT0190  T1     6
43 A_1 DT0216  T1     7
44 A_1 DT0200  T1     8
45 A_1 DT0335  T1     9
46 A_1 DT0403  T1    10
47 A_1 DT0443  T1    11
48 A_1 DT0452  T1    12
49 B_0 DT0016  T0     1
50 B_0 DT0133  T0     2
51 B_0 DT0318  T0     3
52 B_0 DT0326  T0     4
53 B_0 DT0374  T0     5
54 B_0 DT0405  T0     6
55 B_0 DT0549  T0     7
56 B_0 DT0613  T0     8
57 B_0 DT0625  T0     9
58 B_0 DT0646  T0    10
59 B_0 DT0670  T0    11
60 B_0 DT0987  T0    12
61 B_0 DT0016  T1     1
62 B_0 DT0133  T1     2
63 B_0 DT0318  T1     3
64 B_0 DT0326  T1     4
65 B_0 DT0374  T1     5
66 B_0 DT0405  T1     6
67 B_0 DT0549  T1     7
68 B_0 DT0613  T1     8
69 B_0 DT0625  T1     9
70 B_0 DT0646  T1    10
71 B_0 DT0670  T1    11
72 B_0 DT0987  T1    12
73 B_1 DT0096  T0     1
74 B_1 DT0165  T0     2
75 B_1 DT0184  T0     3
76 B_1 DT0281  T0     4
77 B_1 DT0310  T0     5
78 B_1 DT0370  T0     6
79 B_1 DT0465  T0     7
80 B_1 DT0609  T0     8
81 B_1 DT0689  T0     9
82 B_1 DT0733  T0    10
83 B_1 DT0759  T0    11
84 B_1 DT0962  T0    12
85 B_1 DT0096  T1     1
86 B_1 DT0165  T1     2
87 B_1 DT0184  T1     3
88 B_1 DT0281  T1     4
89 B_1 DT0310  T1     5
90 B_1 DT0370  T1     6
91 B_1 DT0465  T1     7
92 B_1 DT0609  T1     8
93 B_1 DT0689  T1     9
94 B_1 DT0733  T1    10
95 B_1 DT0759  T1    11
96 B_1 DT0962  T1    12

I have the following design:

> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= ~ grp + grp:ind.n + grp:cnd)

Then I can get the expression changes over time for each group:

> res <- results(dds, name="grpA_0.cndT1")

> res <- results(dds, name="grpA_1.cndT1")

> res <- results(dds, name="grpB_0.cndT1")

> res <- results(dds, name="grpB_1.cndT1")

And compare the expression changes over time between groups:

> res<-results(dds, contrast=list("grpA_0.cndT1","grpA_1.cndT1"))

> res<-results(dds, contrast=list("grpB_0.cndT1","grpB_1.cndT1"))

 

Is this experimental design and the application of "ind.n" appropriate for my analysis? I have followed the same approach for comparisons of grp = A or B, and 0 or 1 by repeating the analyses (with 24 samples in each group). 

In the results for "grpA_0.cndT1" I get 328 genes with Padj<0.05 and logFC> +/- 0.32 (equivalent to a FC of 1.25). For  "grpA_1.cndT1" I get 1283 genes significant by these same parameters. However when I contrast the two, there are no significantly differentially expressed genes (the lowest Padj is 0.1). To take one gene as an example, IL11, in group A_0 it has a log2FC of 5.75 and Padj of 4.86E-4, and in group A_1 it has a log2FC of 9.52 and a Padj of 2.77E-11. BaseMean is 140.99. In the contrast of A_0 v A_1, IL11 has a log2FC of -3.77 and Padj=0.57. Perhaps there is a lot of variation/dispersion within these groups and that is why the difference in expression of IL11 between A_0 and A_1 is not statistically significant. But given the differences in the numbers of DE genes (328 and 1283), I am a little surprised/disappointed there are no significant differences in the contrast. 

Hence I want to double check that my experimental design is correct for my data/questions, to verify that these results are real and not a result of an error in my design. I would be really grateful if you are able to help me. 

Many thanks

Tamsyn

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocInstaller_1.26.0       plyr_1.8.4                 DESeq2_1.16.1             
 [4] SummarizedExperiment_1.6.1 DelayedArray_0.2.2         matrixStats_0.52.2        
 [7] Biobase_2.36.2             GenomicRanges_1.28.2       GenomeInfoDb_1.12.0       
[10] IRanges_2.10.1             S4Vectors_0.14.1           BiocGenerics_0.22.0       

loaded via a namespace (and not attached):
 [1] genefilter_1.58.1       locfit_1.5-9.1          splines_3.4.0          
 [4] lattice_0.20-35         colorspace_1.3-2        htmltools_0.3.6        
 [7] base64enc_0.1-3         survival_2.41-3         XML_3.98-1.7           
[10] rlang_0.1               DBI_0.6-1               foreign_0.8-68         
[13] BiocParallel_1.10.1     RColorBrewer_1.1-2      GenomeInfoDbData_0.99.0
[16] stringr_1.2.0           zlibbioc_1.22.0         munsell_0.4.3          
[19] gtable_0.2.0            htmlwidgets_0.8         memoise_1.1.0          
[22] latticeExtra_0.6-28     knitr_1.15.1            geneplotter_1.54.0     
[25] AnnotationDbi_1.38.0    htmlTable_1.9           Rcpp_0.12.10           
[28] readr_1.1.1             acepack_1.4.1           xtable_1.8-2           
[31] scales_0.4.1            backports_1.0.5         checkmate_1.8.2        
[34] Hmisc_4.0-3             annotate_1.54.0         XVector_0.16.0         
[37] gridExtra_2.2.1         hms_0.3                 ggplot2_2.2.1          
[40] digest_0.6.12           stringi_1.1.5           grid_3.4.0             
[43] tools_3.4.0             bitops_1.0-6            magrittr_1.5           
[46] RSQLite_1.1-2           lazyeval_0.2.0          RCurl_1.95-4.8         
[49] tibble_1.3.1            Formula_1.2-1           cluster_2.0.6          
[52] MASS_7.3-47             Matrix_1.2-10           data.table_1.10.4      
[55] R6_2.2.1                rpart_4.1-11            nnet_7.3-12            
[58] compiler_3.4.0         

 

 

 

deseq2 multiple factor design multiple time points paired samples • 2.1k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 19 hours ago
United States

hi Tamsyn,

Yes your design and use of the factor variable ind.n looks correct.

Regarding the question of why the contrasts are not significant, while the time comparisons within groups are significant and with large effect size, this can definitely happen. I have a dataset recently where the effects were really large and significant, and the log2FoldChanges not close to each other, but still not significant, because there was a lot of uncertainty in exactly how big the LFC was although clearly no where near 0.

ADD COMMENT

Login before adding your answer.

Traffic: 731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6