To Whom It May Concern,
I have been analyzing two large RNAseq projects separately and successfully. Recently, it was suggested that I merge the analyses so that normalized gene counts could be compared (not statistically) between the two experiments while also retaining the ability to perform statistical tests within the experiments. The two experimental designs differ greatly and I am having some trouble designing a model for the data that works without returning:
Error in checkFullRank(modelMatrix)
My goal is to generate one dataset normalized together and then run statistical analysis within the experiments: the effect of conditions at each stage for both groups. The code I used for the separate experimental (group) analyses worked, but the design falls apart when I attempt to merge them together. This is obvious because the experimental designs are different: Two way Vs Multifactorial. I am just confused about how to code this appropriately.
Here is the sample table (sorry for the jargon); simply the experiments are "group" Temp and Anoxia, on different stages of development with varying conditions. Following is the attempt I most recently made.
> sampleTable
samples condition stage group
1 A1 esc Epiboly Temp
2 A2 esc Epiboly Temp
3 A3 esc Epiboly Temp
4 B1 esc NeuralKeel Temp
5 B2 esc NeuralKeel Temp
6 B3 esc NeuralKeel Temp
7 C1 esc 6somites Temp
8 C2 esc 6somites Temp
9 C3 esc 6somites Temp
10 D1 esc 10somites Temp
11 D2 esc 10somites Temp
12 D3 esc 10somites Temp
13 E1 esc 16somites Temp
14 E2 esc 16somites Temp
15 E3 esc 16somites Temp
16 F1 esc 20somites Temp
17 F2 esc 20somites Temp
18 F3 esc 20somites Temp
19 G1 esc 24somites Temp
20 G2 esc 24somites Temp
21 G3 esc 24somites Temp
22 H1 diap Epiboly Temp
23 H2 diap Epiboly Temp
24 H3 diap Epiboly Temp
25 I1 diap NeuralKeel Temp
26 I2 diap NeuralKeel Temp
27 I3 diap NeuralKeel Temp
28 J1 diap 6somites Temp
29 J2 diap 6somites Temp
30 J3 diap 6somites Temp
31 K1 diap 10somites Temp
32 K2 diap 10somites Temp
33 K3 diap 10somites Temp
34 L1 diap 16somites Temp
35 L2 diap 16somites Temp
36 L3 diap 16somites Temp
37 M1 diap 20somites Temp
38 M2 diap 20somites Temp
39 M3 diap 20somites Temp
40 N1 diap 24somites Temp
41 N2 diap 24somites Temp
42 N3 diap 24somites Temp
43 Wa1 t0 dpd_0 Anoxia
44 Wa2 t0 dpd_0 Anoxia
45 Wa3 t0 dpd_0 Anoxia
46 Wa4 t0 dpd_0 Anoxia
47 Wb1 Early_Anoxia dpd_0 Anoxia
48 Wb2 Early_Anoxia dpd_0 Anoxia
49 Wb3 Early_Anoxia dpd_0 Anoxia
50 Wb4 Early_Anoxia dpd_0 Anoxia
51 Wc1 Late_Anoxia dpd_0 Anoxia
52 Wc2 Late_Anoxia dpd_0 Anoxia
53 Wc3 Late_Anoxia dpd_0 Anoxia
54 Wc4 Late_Anoxia dpd_0 Anoxia
55 Wd1 Early_Rec dpd_0 Anoxia
56 Wd2 Early_Rec dpd_0 Anoxia
57 Wd3 Early_Rec dpd_0 Anoxia
58 Wd4 Early_Rec dpd_0 Anoxia
59 We1 Late_Rec dpd_0 Anoxia
60 We2 Late_Rec dpd_0 Anoxia
61 We3 Late_Rec dpd_0 Anoxia
62 We4 Late_Rec dpd_0 Anoxia
63 Xa1 t0 dpd_4 Anoxia
64 Xa2 t0 dpd_4 Anoxia
65 Xa3 t0 dpd_4 Anoxia
66 Xa4 t0 dpd_4 Anoxia
67 Xb1 Early_Anoxia dpd_4 Anoxia
68 Xb2 Early_Anoxia dpd_4 Anoxia
69 Xb3 Early_Anoxia dpd_4 Anoxia
70 Xb4 Early_Anoxia dpd_4 Anoxia
71 Xc1 Late_Anoxia dpd_4 Anoxia
72 Xc2 Late_Anoxia dpd_4 Anoxia
73 Xc3 Late_Anoxia dpd_4 Anoxia
74 Xc4 Late_Anoxia dpd_4 Anoxia
75 Xd1 Early_Rec dpd_4 Anoxia
76 Xd2 Early_Rec dpd_4 Anoxia
77 Xd3 Early_Rec dpd_4 Anoxia
78 Xd4 Early_Rec dpd_4 Anoxia
79 Xe1 Late_Rec dpd_4 Anoxia
80 Xe2 Late_Rec dpd_4 Anoxia
81 Xe3 Late_Rec dpd_4 Anoxia
82 Xe4 Late_Rec dpd_4 Anoxia
83 Ya1 t0 dpd_12 Anoxia
84 Ya2 t0 dpd_12 Anoxia
85 Ya3 t0 dpd_12 Anoxia
86 Ya4 t0 dpd_12 Anoxia
87 Ya5 t0 dpd_12 Anoxia
88 Ya6 t0 dpd_12 Anoxia
89 Yb1 Early_Anoxia dpd_12 Anoxia
90 Yb2 Early_Anoxia dpd_12 Anoxia
91 Yb3 Early_Anoxia dpd_12 Anoxia
92 Yb4 Early_Anoxia dpd_12 Anoxia
93 Yb5 Early_Anoxia dpd_12 Anoxia
94 Yb6 Early_Anoxia dpd_12 Anoxia
95 Yc1 Late_Anoxia dpd_12 Anoxia
96 Yc2 Late_Anoxia dpd_12 Anoxia
97 Yc3 Late_Anoxia dpd_12 Anoxia
98 Yc4 Late_Anoxia dpd_12 Anoxia
99 Yc5 Late_Anoxia dpd_12 Anoxia
100 Yc6 Late_Anoxia dpd_12 Anoxia
101 Yd1 Early_Rec dpd_12 Anoxia
102 Yd2 Early_Rec dpd_12 Anoxia
103 Yd3 Early_Rec dpd_12 Anoxia
104 Yd4 Early_Rec dpd_12 Anoxia
105 Yd5 Early_Rec dpd_12 Anoxia
106 Yd6 Early_Rec dpd_12 Anoxia
107 Ye1 Late_Rec dpd_12 Anoxia
108 Ye2 Late_Rec dpd_12 Anoxia
109 Ye3 Late_Rec dpd_12 Anoxia
110 Ye4 Late_Rec dpd_12 Anoxia
111 Ye5 Late_Rec dpd_12 Anoxia
112 Ye6 Late_Rec dpd_12 Anoxia
113 Za1 t0 dpd_20 Anoxia
114 Za2 t0 dpd_20 Anoxia
115 Za3 t0 dpd_20 Anoxia
116 Za4 t0 dpd_20 Anoxia
117 Zb1 Early_Anoxia dpd_20 Anoxia
118 Zb2 Early_Anoxia dpd_20 Anoxia
119 Zb3 Early_Anoxia dpd_20 Anoxia
120 Zb4 Early_Anoxia dpd_20 Anoxia
121 Zc1 Late_Anoxia dpd_20 Anoxia
122 Zc2 Late_Anoxia dpd_20 Anoxia
123 Zc3 Late_Anoxia dpd_20 Anoxia
124 Zc4 Late_Anoxia dpd_20 Anoxia
125 Zd1 Early_Rec dpd_20 Anoxia
126 Zd2 Early_Rec dpd_20 Anoxia
127 Zd3 Early_Rec dpd_20 Anoxia
128 Zd4 Early_Rec dpd_20 Anoxia
129 Ze1 Late_Rec dpd_20 Anoxia
130 Ze2 Late_Rec dpd_20 Anoxia
131 Ze3 Late_Rec dpd_20 Anoxia
132 Ze4 Late_Rec dpd_20 Anoxia
> dds <- DESeqDataSet(se, design = ~ group)
(this I had to troubleshoot extensively but is likely the contributor to my problems)
> dds
class: DESeqDataSet
dim: 26751 132
exptData(0):
assays(1): counts
rownames(26751): 12S 16S ... zzef1 zzz3
rowRanges metadata column names(0):
colnames(132): A1 A2 ... Ze3 Ze4
colData names(4): samples condition stage group
> dds <- estimateSizeFactors(dds)
> mnc <- rowMeans(counts(dds, normalized=TRUE))
> dds1 <- dds[ mnc > 1, ]
> dds1
class: DESeqDataSet
dim: 25945 132
exptData(0):
assays(1): counts
rownames(25945): 12S 16S ... zzef1 zzz3
rowRanges metadata column names(0):
colnames(132): A1 A2 ... Ze3 Ze4
colData names(5): samples condition stage group sizeFactor
> dds1.Epiboly <- dds1[dds1$group == "Temp", dds1$stage == "Epiboly", design= ~condition ]
> dds1.Epiboly <- DESeq(dds1.Epiboly)
using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
Levels or combinations of levels without any samples have resulted in
column(s) of zeros in the model matrix.
See the section 'Model matrix not full rank' in vignette('DESeq2')
Thank you for any help with this!
> sessionInfo() R version 3.2.1 (2015-06-18) Platform: x86_64-unknown-linux-gnu (64-bit) Running under: CentOS release 6.8 (Final) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] DESeq2_1.8.2 RcppArmadillo_0.6.500.4.0 [3] Rcpp_0.12.3 GenomicFeatures_1.20.6 [5] AnnotationDbi_1.30.1 Biobase_2.28.0 [7] BiocParallel_1.2.22 GenomicAlignments_1.4.2 [9] Rsamtools_1.20.5 Biostrings_2.36.4 [11] XVector_0.8.0 GenomicRanges_1.20.8 [13] GenomeInfoDb_1.4.3 IRanges_2.2.9 [15] S4Vectors_0.6.6 BiocGenerics_0.14.0 loaded via a namespace (and not attached): [1] RColorBrewer_1.1-2 futile.logger_1.4.1 plyr_1.8.3 [4] bitops_1.0-6 futile.options_1.0.0 tools_3.2.1 [7] zlibbioc_1.14.0 rpart_4.1-10 biomaRt_2.24.1 [10] annotate_1.46.1 RSQLite_1.0.0 gtable_0.1.2 [13] lattice_0.20-33 DBI_0.3.1 gridExtra_2.0.0 [16] genefilter_1.50.0 cluster_2.0.3 rtracklayer_1.28.10 [19] locfit_1.5-9.1 nnet_7.3-12 grid_3.2.1 [22] XML_3.98-1.3 survival_2.38-3 foreign_0.8-66 [25] latticeExtra_0.6-26 Formula_1.2-1 geneplotter_1.46.0 [28] ggplot2_2.0.0 lambda.r_1.1.7 scales_0.3.0 [31] Hmisc_3.17-1 splines_3.2.1 xtable_1.8-0 [34] colorspace_1.2-6 acepack_1.3-3.3 RCurl_1.95-4.7 [37] munsell_0.4.2

Hi Michael,
Thank you! This is a much better approach.
-Amie