Hi, I am currently trying to run a deferential gene expression analysis to determine the the impact of Ewed (maternal diet) , Lambd (offspring diet) and sex (gender) on the mRNA expression of perirenal fat (per). My experimental design, therefore is three way interactions (Ewed: Lambd: Sex) and I would also like to test for the interaction between Ewed:Lambd, Ewed:Sex.
This is the dataset that I used;
ID.Ewed.Lambd.Sex.rep
1 1768,HIGH,CONV,M,1
2 1796,HIGH,CONV,M,2
3 1774,HIGH,CONV,F,1
4 1778,HIGH,CONV,F,2
5 1788,HIGH,CONV,F,3
6 1755,HIGH,HCHF,M,1
7 1787,HIGH,HCHF,M,2
8 1799,HIGH,HCHF,M,3
9 1767,HIGH,HCHF,F,1
10 1777,HIGH,HCHF,F,2
11 1823,HIGH,HCHF,F,3
12 1772,LOW,CONV,M,1
13 1784,LOW,CONV,M,2
14 1816,LOW,CONV,M,3
15 1822,LOW,CONV,M,4
16 1766,LOW,CONV,F,1
17 1810,LOW,CONV,F,2
18 1818,LOW,CONV,F,3
19 1826,LOW,CONV,F,4
20 1828,LOW,CONV,F,5
21 1759,LOW,HCHF,M,1
22 1783,LOW,HCHF,M,2
23 1785,LOW,HCHF,M,3
24 1821,LOW,HCHF,M,4
25 1775,LOW,HCHF,F,1
26 1807,LOW,HCHF,F,2
27 1827,LOW,HCHF,F,3
I try to run the analysis using the following model matrix (without the interaction); where I also include the covariates including Lambbw6 (body weight at 6 months) + Birthw (birth weight) + Lambbw2.5 (body weight at 2 1/2 years old)
design_per <- model.matrix(~ -1 + per$Lambbw6 + per$Birthw + per$Lambbw2.5 + per$Sex + per$Lambd + per$Ewed, per)
design_per
the model matrix obtained;
design_per per$Lambbw6 per$Birthw per$Lambbw2.5 per$Sex per$SexF per$SexM per$LambdCONV per$LambdHCHF per$EwedHIGH per$EwedLOW 1 52.5 4.520 98.3 0 0 1 0 1 1 0 2 40.6 2.630 98.6 0 0 1 0 1 0 1 3 25.1 4.610 94.0 0 1 0 0 1 0 0 34 45.3 4.190 84.0 0 1 0 0 1 0 0 4 32.9 3.560 85.3 0 1 0 1 0 0 0 5 31.6 4.310 104.5 0 1 0 1 0 0 1 6 35.4 4.270 77.6 0 1 0 0 1 1 0 35 41.9 5.030 120.7 0 0 1 1 0 1 0 7 52.1 3.960 101.5 0 0 1 0 1 0 0 8 35.0 4.260 88.8 0 1 0 1 0 0 0 9 40.3 3.590 85.0 0 0 1 1 0 0 1 10 35.8 4.170 95.4 0 1 0 1 0 1 0 11 52.4 3.550 105.9 0 1 0 0 1 0 1 12 36.7 3.040 96.8 0 1 0 0 1 1 0 13 35.3 2.980 86.8 0 1 0 1 0 1 0 36 40.9 5.050 100.6 0 0 1 1 0 0 0 14 55.2 4.220 95.4 0 0 1 0 1 0 1 15 35.2 3.838 89.5 0 0 1 1 0 0 1 16 38.5 3.830 100.5 0 0 1 0 1 0 1 17 51.6 4.990 108.5 0 0 1 0 1 1 0 18 34.4 4.740 97.6 0 1 0 1 0 1 0 19 34.5 4.990 89.0 0 0 1 0 1 0 0 20 39.2 4.680 91.1 0 0 1 1 0 0 0 21 35.4 4.350 106.0 0 0 1 1 0 1 0 22 33.7 5.180 101.9 0 0 1 0 1 1 0 23 37.1 3.950 87.3 0 1 0 1 0 0 0 24 29.5 4.570 93.7 0 1 0 1 0 0 0 25 28.3 4.860 97.1 0 1 0 0 1 0 1 26 32.7 4.680 82.3 0 1 0 1 0 0 1 27 42.3 3.480 96.7 0 0 1 1 0 0 1 28 31.2 3.290 88.1 0 1 0 1 0 0 1 29 34.3 5.410 95.3 0 0 1 0 1 0 1 30 32.1 2.910 85.7 0 0 1 1 0 0 1 31 28.6 3.780 85.1 0 1 0 0 1 1 0 32 34.3 4.440 91.8 0 1 0 1 0 0 1 33 30.0 5.110 87.9 0 1 0 1 0 0 1 per$EwedNORM 1 0 2 0 3 1 34 1 4 1 5 0 6 0 35 0 7 1 8 1 9 0 10 0 11 0 12 0 13 0 36 1 14 0 15 0 16 0 17 0 18 0 19 1 20 1 21 0 22 0 23 1 24 1 25 0 26 0 27 0 28 0 29 0 30 0 31 0 32 0 33 0 attr(,"assign") [1] 1 2 3 4 4 4 5 5 6 6 6 attr(,"contrasts") attr(,"contrasts")$
per$Sex
[1] "contr.treatment"
attr(,"contrasts")$per$Lambd
[1] "contr.treatment"
attr(,"contrasts")$per$Ewed
[1] "contr.treatment"
dds_per <- DESeqDataSetFromHTSeqCount(sampleTable=per,
directory=pathht,
design = design_per)
but end up getting this ERROR
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
Levels or combinations of levels without any samples have resulted in
column(s) of zeros in the model matrix.
Please read the vignette section 'Model matrix not full rank':
vignette('DESeq2')
My question are: 1. How do I solve this problem (I believe that the problems are related to ; 1) linear dependency problem, where three variables contain exactly the same information; and 2) group-specific effect of a condition or treatment, and individual nested within groups.
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.22.2 SummarizedExperiment_1.12.0 DelayedArray_0.8.0 BiocParallel_1.16.2
[5] matrixStats_0.54.0 Biobase_2.42.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.1
[9] IRanges_2.16.0 S4Vectors_0.20.1 BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] bit64_0.9-7 splines_3.5.1 Formula_1.2-3 assertthat_0.2.0 BiocManager_1.30.4
[6] latticeExtra_0.6-28 blob_1.1.1 GenomeInfoDbData_1.2.0 pillar_1.3.1 RSQLite_2.1.1
[11] backports_1.1.3 lattice_0.20-38 glue_1.3.0 digest_0.6.18 RColorBrewer_1.1-2
[16] XVector_0.22.0 checkmate_1.9.1 colorspace_1.4-0 htmltools_0.3.6 Matrix_1.2-15
[21] plyr_1.8.4 XML_3.98-1.16 pkgconfig_2.0.2 genefilter_1.64.0 zlibbioc_1.28.0
[26] purrr_0.3.0 xtable_1.8-3 scales_1.0.0 htmlTable_1.13.1 tibble_2.0.1
[31] annotate_1.60.0 ggplot2_3.1.0 nnet_7.3-12 lazyeval_0.2.1 survival_2.43-3
[36] magrittr_1.5 crayon_1.3.4 memoise_1.1.0 foreign_0.8-71 tools_3.5.1
[41] data.table_1.12.0 stringr_1.3.1 locfit_1.5-9.1 munsell_0.5.0 cluster_2.0.7-1
[46] AnnotationDbi_1.44.0 bindrcpp_0.2.2 compiler_3.5.1 rlang_0.3.1 grid_3.5.1
[51] RCurl_1.95-4.11 rstudioapi_0.9.0 htmlwidgets_1.3 bitops_1.0-6 base64enc_0.1-3
[56] gtable_0.2.0 DBI_1.0.0 R6_2.3.0 gridExtra_2.3 knitr_1.21
[61] dplyr_0.7.8 bit_1.1-14 bindr_0.1.1 Hmisc_4.2-0 stringi_1.2.4
[66] Rcpp_1.0.0 geneplotter_1.60.0 rpart_4.1-13 acepack_1.4.1 tidyselect_0.2.5
[71] xfun_0.4
Thank you in advance!
Cheers, Mila
Thank you again for your reply. I completely agree with you that I should design the model based on the specific biological approach.
What I did for now is combining the two factors (Ewed & Lambd) as a group and used it in the design = ~ group + Sex