Question

DESeq2 Model matrix design: Merging large RNAseq projects

0

Entering edit mode

arom2 • 0

@arom2-8204

Last seen 7.0 years ago

United States

To Whom It May Concern,

I have been analyzing two large RNAseq projects separately and successfully. Recently, it was suggested that I merge the analyses so that normalized gene counts could be compared (not statistically) between the two experiments while also retaining the ability to perform statistical tests within the experiments. The two experimental designs differ greatly and I am having some trouble designing a model for the data that works without returning:

Error in checkFullRank(modelMatrix)

My goal is to generate one dataset normalized together and then run statistical analysis within the experiments: the effect of conditions at each stage for both groups. The code I used for the separate experimental (group) analyses worked, but the design falls apart when I attempt to merge them together. This is obvious because the experimental designs are different: Two way Vs Multifactorial. I am just confused about how to code this appropriately.

Here is the sample table (sorry for the jargon); simply the experiments are "group" Temp and Anoxia, on different stages of development with varying conditions. Following is the attempt I most recently made.

> sampleTable

    samples    condition      stage  group
1        A1          esc    Epiboly   Temp
2        A2          esc    Epiboly   Temp
3        A3          esc    Epiboly   Temp
4        B1          esc NeuralKeel   Temp
5        B2          esc NeuralKeel   Temp
6        B3          esc NeuralKeel   Temp
7        C1          esc   6somites   Temp
8        C2          esc   6somites   Temp
9        C3          esc   6somites   Temp
10       D1          esc  10somites   Temp
11       D2          esc  10somites   Temp
12       D3          esc  10somites   Temp
13       E1          esc  16somites   Temp
14       E2          esc  16somites   Temp
15       E3          esc  16somites   Temp
16       F1          esc  20somites   Temp
17       F2          esc  20somites   Temp
18       F3          esc  20somites   Temp
19       G1          esc  24somites   Temp
20       G2          esc  24somites   Temp
21       G3          esc  24somites   Temp
22       H1         diap    Epiboly   Temp
23       H2         diap    Epiboly   Temp
24       H3         diap    Epiboly   Temp
25       I1         diap NeuralKeel   Temp
26       I2         diap NeuralKeel   Temp
27       I3         diap NeuralKeel   Temp
28       J1         diap   6somites   Temp
29       J2         diap   6somites   Temp
30       J3         diap   6somites   Temp
31       K1         diap  10somites   Temp
32       K2         diap  10somites   Temp
33       K3         diap  10somites   Temp
34       L1         diap  16somites   Temp
35       L2         diap  16somites   Temp
36       L3         diap  16somites   Temp
37       M1         diap  20somites   Temp
38       M2         diap  20somites   Temp
39       M3         diap  20somites   Temp
40       N1         diap  24somites   Temp
41       N2         diap  24somites   Temp
42       N3         diap  24somites   Temp
43      Wa1           t0      dpd_0 Anoxia
44      Wa2           t0      dpd_0 Anoxia
45      Wa3           t0      dpd_0 Anoxia
46      Wa4           t0      dpd_0 Anoxia
47      Wb1 Early_Anoxia      dpd_0 Anoxia
48      Wb2 Early_Anoxia      dpd_0 Anoxia
49      Wb3 Early_Anoxia      dpd_0 Anoxia
50      Wb4 Early_Anoxia      dpd_0 Anoxia
51      Wc1  Late_Anoxia      dpd_0 Anoxia
52      Wc2  Late_Anoxia      dpd_0 Anoxia
53      Wc3  Late_Anoxia      dpd_0 Anoxia
54      Wc4  Late_Anoxia      dpd_0 Anoxia
55      Wd1    Early_Rec      dpd_0 Anoxia
56      Wd2    Early_Rec      dpd_0 Anoxia
57      Wd3    Early_Rec      dpd_0 Anoxia
58      Wd4    Early_Rec      dpd_0 Anoxia
59      We1     Late_Rec      dpd_0 Anoxia
60      We2     Late_Rec      dpd_0 Anoxia
61      We3     Late_Rec      dpd_0 Anoxia
62      We4     Late_Rec      dpd_0 Anoxia
63      Xa1           t0      dpd_4 Anoxia
64      Xa2           t0      dpd_4 Anoxia
65      Xa3           t0      dpd_4 Anoxia
66      Xa4           t0      dpd_4 Anoxia
67      Xb1 Early_Anoxia      dpd_4 Anoxia
68      Xb2 Early_Anoxia      dpd_4 Anoxia
69      Xb3 Early_Anoxia      dpd_4 Anoxia
70      Xb4 Early_Anoxia      dpd_4 Anoxia
71      Xc1  Late_Anoxia      dpd_4 Anoxia
72      Xc2  Late_Anoxia      dpd_4 Anoxia
73      Xc3  Late_Anoxia      dpd_4 Anoxia
74      Xc4  Late_Anoxia      dpd_4 Anoxia
75      Xd1    Early_Rec      dpd_4 Anoxia
76      Xd2    Early_Rec      dpd_4 Anoxia
77      Xd3    Early_Rec      dpd_4 Anoxia
78      Xd4    Early_Rec      dpd_4 Anoxia
79      Xe1     Late_Rec      dpd_4 Anoxia
80      Xe2     Late_Rec      dpd_4 Anoxia
81      Xe3     Late_Rec      dpd_4 Anoxia
82      Xe4     Late_Rec      dpd_4 Anoxia
83      Ya1           t0     dpd_12 Anoxia
84      Ya2           t0     dpd_12 Anoxia
85      Ya3           t0     dpd_12 Anoxia
86      Ya4           t0     dpd_12 Anoxia
87      Ya5           t0     dpd_12 Anoxia
88      Ya6           t0     dpd_12 Anoxia
89      Yb1 Early_Anoxia     dpd_12 Anoxia
90      Yb2 Early_Anoxia     dpd_12 Anoxia
91      Yb3 Early_Anoxia     dpd_12 Anoxia
92      Yb4 Early_Anoxia     dpd_12 Anoxia
93      Yb5 Early_Anoxia     dpd_12 Anoxia
94      Yb6 Early_Anoxia     dpd_12 Anoxia
95      Yc1  Late_Anoxia     dpd_12 Anoxia
96      Yc2  Late_Anoxia     dpd_12 Anoxia
97      Yc3  Late_Anoxia     dpd_12 Anoxia
98      Yc4  Late_Anoxia     dpd_12 Anoxia
99      Yc5  Late_Anoxia     dpd_12 Anoxia
100     Yc6  Late_Anoxia     dpd_12 Anoxia
101     Yd1    Early_Rec     dpd_12 Anoxia
102     Yd2    Early_Rec     dpd_12 Anoxia
103     Yd3    Early_Rec     dpd_12 Anoxia
104     Yd4    Early_Rec     dpd_12 Anoxia
105     Yd5    Early_Rec     dpd_12 Anoxia
106     Yd6    Early_Rec     dpd_12 Anoxia
107     Ye1     Late_Rec     dpd_12 Anoxia
108     Ye2     Late_Rec     dpd_12 Anoxia
109     Ye3     Late_Rec     dpd_12 Anoxia
110     Ye4     Late_Rec     dpd_12 Anoxia
111     Ye5     Late_Rec     dpd_12 Anoxia
112     Ye6     Late_Rec     dpd_12 Anoxia
113     Za1           t0     dpd_20 Anoxia
114     Za2           t0     dpd_20 Anoxia
115     Za3           t0     dpd_20 Anoxia
116     Za4           t0     dpd_20 Anoxia
117     Zb1 Early_Anoxia     dpd_20 Anoxia
118     Zb2 Early_Anoxia     dpd_20 Anoxia
119     Zb3 Early_Anoxia     dpd_20 Anoxia
120     Zb4 Early_Anoxia     dpd_20 Anoxia
121     Zc1  Late_Anoxia     dpd_20 Anoxia
122     Zc2  Late_Anoxia     dpd_20 Anoxia
123     Zc3  Late_Anoxia     dpd_20 Anoxia
124     Zc4  Late_Anoxia     dpd_20 Anoxia
125     Zd1    Early_Rec     dpd_20 Anoxia
126     Zd2    Early_Rec     dpd_20 Anoxia
127     Zd3    Early_Rec     dpd_20 Anoxia
128     Zd4    Early_Rec     dpd_20 Anoxia
129     Ze1     Late_Rec     dpd_20 Anoxia
130     Ze2     Late_Rec     dpd_20 Anoxia
131     Ze3     Late_Rec     dpd_20 Anoxia
132     Ze4     Late_Rec     dpd_20 Anoxia

> dds <- DESeqDataSet(se, design = ~ group)

(this I had to troubleshoot extensively but is likely the contributor to my problems)

> dds
class: DESeqDataSet
dim: 26751 132
exptData(0):
assays(1): counts
rownames(26751): 12S 16S ... zzef1 zzz3
rowRanges metadata column names(0):
colnames(132): A1 A2 ... Ze3 Ze4
colData names(4): samples condition stage group

> dds <- estimateSizeFactors(dds)

> mnc <- rowMeans(counts(dds, normalized=TRUE))

> dds1 <- dds[ mnc > 1, ]

> dds1
class: DESeqDataSet
dim: 25945 132
exptData(0):
assays(1): counts
rownames(25945): 12S 16S ... zzef1 zzz3
rowRanges metadata column names(0):
colnames(132): A1 A2 ... Ze3 Ze4
colData names(5): samples condition stage group sizeFactor

> dds1.Epiboly <- dds1[dds1$group == "Temp", dds1$stage == "Epiboly", design= ~condition ]

> dds1.Epiboly <- DESeq(dds1.Epiboly)
using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  Levels or combinations of levels without any samples have resulted in
  column(s) of zeros in the model matrix.
See the section 'Model matrix not full rank' in vignette('DESeq2')

Thank you for any help with this!

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS release 6.8 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] DESeq2_1.8.2              RcppArmadillo_0.6.500.4.0
 [3] Rcpp_0.12.3               GenomicFeatures_1.20.6   
 [5] AnnotationDbi_1.30.1      Biobase_2.28.0           
 [7] BiocParallel_1.2.22       GenomicAlignments_1.4.2  
 [9] Rsamtools_1.20.5          Biostrings_2.36.4        
[11] XVector_0.8.0             GenomicRanges_1.20.8     
[13] GenomeInfoDb_1.4.3        IRanges_2.2.9            
[15] S4Vectors_0.6.6           BiocGenerics_0.14.0      

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3          
 [4] bitops_1.0-6         futile.options_1.0.0 tools_3.2.1         
 [7] zlibbioc_1.14.0      rpart_4.1-10         biomaRt_2.24.1      
[10] annotate_1.46.1      RSQLite_1.0.0        gtable_0.1.2        
[13] lattice_0.20-33      DBI_0.3.1            gridExtra_2.0.0     
[16] genefilter_1.50.0    cluster_2.0.3        rtracklayer_1.28.10 
[19] locfit_1.5-9.1       nnet_7.3-12          grid_3.2.1          
[22] XML_3.98-1.3         survival_2.38-3      foreign_0.8-66      
[25] latticeExtra_0.6-26  Formula_1.2-1        geneplotter_1.46.0  
[28] ggplot2_2.0.0        lambda.r_1.1.7       scales_0.3.0        
[31] Hmisc_3.17-1         splines_3.2.1        xtable_1.8-0        
[34] colorspace_1.2-6     acepack_1.3-3.3      RCurl_1.95-4.7      
[37] munsell_0.4.2

deseq2 model.matrix batch effect rnaseq • 1.6k views

ADD COMMENT • link 7.5 years ago arom2 • 0

0

Entering edit mode

Hi Michael,

Thank you! This is a much better approach.

-Amie

ADD REPLY • link 7.5 years ago arom2 • 0

score 0 · Answer 1 · 2016-10-04

hi,

I don't really see the value in putting the two datasets together for normalization. Normalization works best when the other samples in the dataset are relatively similar except for a minority of DE genes (note that it's fairly robust to having many DE genes, but I don't see why it would be better for normalization to put the datasets together when you won't be making comparisons across). Putting the datasets together just causes hassle in how to formulate designs that aren't confounded.

If you really need to make one large matrix for plotting normalized counts, you could do something such as:

cts1 <- counts(dds1, normalized=TRUE)
cts2 <- counts(dds2, normalized=TRUE)
log.ratio <- rowMeans(log(cts1)) - rowMeans(log(cts2))
matrix.scale.factor <- exp(median(log.ratio[is.finite(log.ratio)]))
cts.full <- cbind(cts1, cts2 * matrix.scale.factor)