Hi Community, I would like to get feedback about model designs and contrast dealing the issue of "Model not full rank" in DESeq2.
The sample meta info:
library(magrittr)
s_meta <- data.frame(grp = paste(rep(c("D", "L", "M", "H"), times = 2), rep(c("6h", "24h"), each = 4), sep = "_") %>%
rep(times = 12) %>%
factor(levels = c("D_6h", "L_6h", "M_6h", "H_6h", "D_24h", "L_24h", "M_24h", "H_24h")),
b = rep(paste0("b", 1:4), times = c(16, 24, 32, 24)) %>% factor,
id = paste0("U", rep(1:12, each = 8)))
Patients info:
s_meta
grp b id
1 D_6h b1 U1
2 L_6h b1 U1
3 M_6h b1 U1
4 H_6h b1 U1
5 D_24h b1 U1
6 L_24h b1 U1
7 M_24h b1 U1
8 H_24h b1 U1
9 D_6h b1 U2
10 L_6h b1 U2
11 M_6h b1 U2
12 H_6h b1 U2
13 D_24h b1 U2
14 L_24h b1 U2
15 M_24h b1 U2
16 H_24h b1 U2
...
81 D_6h b4 U11
82 L_6h b4 U11
83 M_6h b4 U11
84 H_6h b4 U11
85 D_24h b4 U11
86 L_24h b4 U11
87 M_24h b4 U11
88 H_24h b4 U11
89 D_6h b4 U12
90 L_6h b4 U12
91 M_6h b4 U12
92 H_6h b4 U12
93 D_24h b4 U12
94 L_24h b4 U12
95 M_24h b4 U12
96 H_24h b4 U12
Basically, 12 patients enrolled in a dosage (L, M, H; D is the control ) dependent time-course (6h and 24h) experiment. We are interested in the difference between treatment and controls (eg. "L_6h" vs "D_6h"). I have used a simple design, design(dds) <- formula(~ id + gr_2)
, where id is the patients, and gr_2 represents condtions ("D_6h", "L_6h", etc)
However, later we found there is batch effect (indicated in column b
in the s_meta
, 4 subgroups). Simply change the design to design(dds) <- formula(~ id + b + gr_2)
caused "Matrix not full rank" error.
Following the section of "model-matrix-not-full-rank" of link,
we get the new design:
#- Set the id number
s_meta$idn <- rep(c(1:2, 1:3, 1:4, 1:3), each = 8) %>% factor
s_meta
grp b id idn
1 D_6h b1 U1 1
2 L_6h b1 U1 1
3 M_6h b1 U1 1
4 H_6h b1 U1 1
5 D_24h b1 U1 1
6 L_24h b1 U1 1
7 M_24h b1 U1 1
8 H_24h b1 U1 1
9 D_6h b1 U2 2
10 L_6h b1 U2 2
11 M_6h b1 U2 2
12 H_6h b1 U2 2
13 D_24h b1 U2 2
14 L_24h b1 U2 2
15 M_24h b1 U2 2
16 H_24h b1 U2 2
...
81 D_6h b4 U11 2
82 L_6h b4 U11 2
83 M_6h b4 U11 2
84 H_6h b4 U11 2
85 D_24h b4 U11 2
86 L_24h b4 U11 2
87 M_24h b4 U11 2
88 H_24h b4 U11 2
89 D_6h b4 U12 3
90 L_6h b4 U12 3
91 M_6h b4 U12 3
92 H_6h b4 U12 3
93 D_24h b4 U12 3
94 L_24h b4 U12 3
95 M_24h b4 U12 3
96 H_24h b4 U12 3
Modifications of designs.
#- assign to the colData
dds$b <- s_meta$b
dds$idn <- s_meta$idn
dds$grp <- s_meta$grp
#- rm zero columns as the data is unbalanced. Also no intercept.
m1 <- model.matrix(~ 0 + b + b:idn + b:grp, colData(dds))
a_zero <- apply(m1, 2, function(x) all(x == 0))
idx <- which(a_zero)
m1 <- m1[, -idx]
#- !! new design. betaPrior is set to F.
design(dds) <- formula(~ 0 + b + b:idn + b:grp)
dds <- DESeq(dds, parallel = TRUE, full = m1, , betaPrior = FALSE)
And names
resultsNames(dds)
# [1] "bb1" "bb2" "bb3" "bb4"
# [5] "bb1.idn2" "bb2.idn2" "bb3.idn2" "bb4.idn2"
# [9] "bb2.idn3" "bb3.idn3" "bb4.idn3" "bb3.idn4"
# [13] "bb1.grpD_24h" "bb2.grpD_24h" "bb3.grpD_24h" "bb4.grpD_24h"
# [17] "bb1.grpL_6h" "bb2.grpL_6h" "bb3.grpL_6h" "bb4.grpL_6h"
# [21] "bb1.grpL_24h" "bb2.grpL_24h" "bb3.grpL_24h" "bb4.grpL_24h"
# [25] "bb1.grpM_6h" "bb2.grpM_6h" "bb3.grpM_6h" "bb4.grpM_6h"
# [29] "bb1.grpM_24h" "bb2.grpM_24h" "bb3.grpM_24h" "bb4.grpM_24h"
# [33] "bb1.grpH_6h" "bb2.grpH_6h" "bb3.grpH_6h" "bb4.grpH_6h"
# [37] "bb1.grpH_24h" "bb2.grpH_24h" "bb3.grpH_24h" "bb4.grpH_24h"
We are interested in the difference between treatment and control.
To get the significant test results of H_6h
vs D_6h
(The denominator groups are ignored):
#- H_6h vs. D_6h. listValues is important otherwise the fold changes do NOT reflect average effect.
#- In the examples of results with list, the listValues is NOT set as interaction effect is supposed to add to main.
xx_1 <- results(dds, list(c("bb1:grpH_6", "bb2:grpH_6", "bb3:grpH_6", "bb4:grpH_6")), listValues = c(1/4, -1/4))
To get the significant test results of H_24h
vs D_24h
(the denominator groups are sepecified)
#- H_24h vs. D_24h
xx_2 <- results(dds, list(c("bb1.grpH_24h", "bb2.grpH_24h", "bb3.grpH_24h", "bb4.grpH_24h"),
c("bb1.grp_24h", "bb2.grpD_24h", "bb3.grpD_24h", "bb4.grpD_24h")), listValues = c(1/4, -1/4))
That's first time I dealt wit the complex design and contrast, and I am not sure whether I have done it properly.
Please let me know if there are any issues, and general suggestions are greatly appreciated! Thanks a lot!
Session
sessionInfo( )
R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] BiocParallel_1.33.11 DESeq2_1.40.1
[3] SummarizedExperiment_1.29.1 Biobase_2.59.0
[5] MatrixGenerics_1.12.0 matrixStats_1.0.0
[7] GenomicRanges_1.51.4 GenomeInfoDb_1.36.0
[9] IRanges_2.33.1 S4Vectors_0.37.4
[11] BiocGenerics_0.46.0 future.apply_1.11.0
[13] future_1.32.0 data.table_1.14.10
[15] magrittr_2.0.3 patchwork_1.1.2
[17] ggplot2_3.4.2 ll_0.1.0
[19] colorout_1.2-1
loaded via a namespace (and not attached):
[1] utf8_1.2.3 generics_0.1.3 bitops_1.0-7
[4] lattice_0.21-8 listenv_0.9.0 digest_0.6.31
[7] grid_4.3.0 iterators_1.0.14 foreach_1.5.2
[10] Matrix_1.5-4 fansi_1.0.4 scales_1.2.1
[13] codetools_0.2-19 RApiSerialize_0.1.2 cli_3.6.1
[16] crayon_1.5.2 rlang_1.1.1 XVector_0.39.0
[19] parallelly_1.36.0 munsell_0.5.0 withr_2.5.0
[22] DelayedArray_0.26.3 S4Arrays_1.0.1 qs_0.25.5
[25] tools_4.3.0 parallel_4.3.0 dplyr_1.1.2
[28] colorspace_2.1-0 locfit_1.5-9.7 GenomeInfoDbData_1.2.10
[31] globals_0.16.2 vctrs_0.6.2 R6_2.5.1
[34] lifecycle_1.0.3 zlibbioc_1.45.0 stringfish_0.15.8
[37] pkgconfig_2.0.3 RcppParallel_5.1.7 pillar_1.9.0
[40] gtable_0.3.3 Rcpp_1.0.10 glue_1.6.2
[43] xfun_0.39 tibble_3.2.1 tidyselect_1.2.0
[46] knitr_1.42 compiler_4.3.0 RCurl_1.98-1.12
Thanks for your ideas. I talked with one of my colleagues about the data and we think the model looks fine. A few updates have been made to my codes (see posts).