Dear all,
I have count data from healthy and diseased patients. I have 6 healthy samples vs 14 samples. The diseased patient underwent different treatment types and had biomarkers measured. Because the healthy doner did not undergo any treatment and were not measured for biomarkers I have problems creating a design because of this.
I am interested in genes deferentially expressed between healthy and diseased but also looking at the treatment type effect and the biomarker signature type vs healthy. I also have a classification in disease types.
Should I try ignoreRank=TRUE or would it be best to create a new design without the healthy group and then define a new control group in diseased patients based on treatment and biomarkers? But then, how can I do the comparison with the healthy group afterwards?
> colDATA
row.names condition type treatment biomarkers 01 Healthy1 Healthy nonDiseased non non 02 Healthy2 Healthy nonDiseased non non 03 Healthy3 Healthy nonDiseased non non 04 Healthy4 Healthy nonDiseased non non 05 Healthy5 Healthy nonDiseased non non 06 Healthy6 Healthy nonDiseased non non 07 Diseased1 Diseased Affected 1 2 08 Diseased2 Diseased Affected 2 2 09 Diseased3 Diseased Affected 1 1 10 Diseased4 Diseased Affected 3 2 11 Diseased5 Diseased Affected 2 0 12 Diseased6 Diseased Affected 1 2 13 Diseased7 Diseased Affected 1 2 14 Diseased8 Diseased nonAffected 1 1 15 Diseased9 Diseased Affected 2 2 16 Diseased10 Diseased nonAffected 1 1 17 Diseased11 Diseased11 Affected 3 1 18 Diseased12 Diseased12 Affected 1 2 19 Diseased13 Diseased13 nonAffected 2 1 20 Diseased14 Diseased14 nonAffected 1 2
If I try to create a design using only condition and type or try and other cobination, I get the following error:
dds <- DESeqDataSetFromMatrix(countData = countTABLE, + colData = colDATA, + design = ~ condition + type) Error in DESeqDataSet(se, design = design, ignoreRank) : the model matrix is not full rank, so the model cannot be fit as specified. one or more variables or interaction terms in the design formula are linear combinations of the others and must be removed
I think because type, treatment and biomarkers are nested within the diseased condition this will not work?
How can I make the correct design to get the comparisons?
Many thanks for your help.
Session info:
> sessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.22.1 RColorBrewer_1.1-2 ggplot2_1.0.0 gplots_2.16.0 Biobase_2.26.0 pasilla_0.5.1 [7] DESeq2_1.6.3 RcppArmadillo_0.4.600.0 Rcpp_0.11.3 GenomicRanges_1.18.4 GenomeInfoDb_1.2.4 IRanges_2.0.1 [13] S4Vectors_0.4.0 BiocGenerics_0.12.1 loaded via a namespace (and not attached): [1] acepack_1.3-3.3 annotate_1.44.0 AnnotationDbi_1.28.1 base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8 BiocParallel_1.0.0 [8] bitops_1.0-6 brew_1.0-6 caTools_1.17.1 checkmate_1.5.1 cluster_1.15.3 codetools_0.2-9 colorspace_1.2-4 [15] DBI_0.3.1 digest_0.6.8 fail_1.2 foreach_1.4.2 foreign_0.8-62 Formula_1.1-2 gdata_2.13.3 [22] genefilter_1.48.1 geneplotter_1.44.0 grid_3.1.2 gtable_0.1.2 gtools_3.4.1 Hmisc_3.14-6 iterators_1.0.7 [29] KernSmooth_2.23-13 lattice_0.20-29 latticeExtra_0.6-26 locfit_1.5-9.1 MASS_7.3-35 munsell_0.4.2 nnet_7.3-8 [36] plyr_1.8.1 proto_0.3-10 reshape2_1.4.1 rpart_4.1-8 RSQLite_1.0.0 scales_0.2.4 sendmailR_1.2-1 [43] splines_3.1.2 stringr_0.6.2 survival_2.37-7 tools_3.1.2 XML_3.98-1.1 xtable_1.7-4 XVector_0.6.0