Hi,
I have a bit of experience with simpler model designs in DESeq2. This is a slightly more complex design than usual so wanted to check how I might run the analysis.
I have 35 samples from rats fed normal chow (NC) or high fat diet (HFD). These animals are bred to be susceptible to tumours. They were sacrificed, then tumours and normal tissue from the same organ as tumour were removed. So the metadata I have is:
>conds sampleID Tissue Individual Diet 1 S1 Normal 1 NC 2 S2 Tumour 1 NC 3 S3 Tumour 1 NC 4 S4 Tumour 1 NC 5 S5 Normal 2 NC 6 S6 Normal 3 NC 7 S7 Tumour 4 NC 8 S8 Tumour 4 NC 9 S9 Tumour 5 NC 10 S10 Tumour 5 NC 11 S11 Tumour 5 NC 12 S12 Normal 5 NC 13 S13 Tumour 6 NC 14 S14 Tumour 6 NC 15 S15 Tumour 6 NC 16 S16 Tumour 6 NC 17 S17 Tumour 7 NC 18 S18 Tumour 7 NC 19 S19 Tumour 7 NC 20 S20 Normal 7 NC 21 S21 Normal 8 HFD 22 S22 Normal 10 HFD 23 S23 Tumour 11 HFD 24 S24 Tumour 11 HFD 25 S25 Tumour 11 HFD 26 S26 Tumour 12 HFD 27 S27 Tumour 12 HFD 28 S28 Tumour 12 HFD 29 S29 Tumour 12 HFD 30 S30 Normal 13 HFD 31 S31 Tumour 21 HFD 32 S32 Tumour 22 HFD 33 S33 Tumour 22 HFD 34 S34 Tumour 22 HFD 35 S35 Normal 22 HFD
As you can see, the first individual has a single normal sample, and 3 tumours. These relate to the samples taken at 'surgery', so not just technical replicates of same library sequenced multiple times (hence I believe I should not use collapseReplicates
). As you can also see, not every individual has a tumour and normal sample.
The design I have tried is:
dds <- DESeqDataSetFromMatrix(nz.co, colData = conds, design =~ Tissue + Diet + Tissue:Diet)
where nz.co
is raw counts with any genes with rowSums==0
removed. I feel I should use Individual
in the design but get full rank error (I believe) because of the above issue that some individuals are tumour or normal only.
Any ideas on what to do here would be of great help and much appreciated.
Bruce.
##############
### Update ###
###########
As per Michael's answer below:
dds <- DESeqDataSetFromMatrix(nz.co, colData = conds, design =~ Tissue + Diet) dds$Tissue <- relevel(dds$Tissue, ref="Normal") dds$IndividualN <- factor(c(1,1,1,1,2,3,4,4,5,5,5,5,6,6,6,6,7,7,7,7,1,2,3,3,3,4,4,4,4,5,6,7,7,7,7)) dds <- DESeq(dds) res <- results(dds,contrast=list("DietNC.Tumour","DietNC.Normal"))
###sessionInfo()
R version 3.5.1 (2018-07-02) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS: /apps/builddir/R-3.5.1/lib/libRblas.so LAPACK: /apps/builddir/R-3.5.1/lib/libRlapack.so locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] bindrcpp_0.2.2 biomaRt_2.36.1 [3] forcats_0.3.0 stringr_1.3.1 [5] dplyr_0.7.7 purrr_0.2.5 [7] readr_1.1.1 tidyr_0.8.1 [9] tibble_1.4.2 tidyverse_1.2.1 [11] ggplot2_3.0.0 apeglm_1.2.1 [13] DESeq2_1.20.0 SummarizedExperiment_1.10.1 [15] DelayedArray_0.6.6 BiocParallel_1.14.2 [17] matrixStats_0.54.0 Biobase_2.40.0 [19] GenomicRanges_1.32.7 GenomeInfoDb_1.16.0 [21] IRanges_2.14.12 S4Vectors_0.18.3 [23] BiocGenerics_0.26.0 loaded via a namespace (and not attached): [1] nlme_3.1-137 bitops_1.0-6 lubridate_1.7.4 [4] bit64_0.9-7 progress_1.2.0 RColorBrewer_1.1-2 [7] httr_1.3.1 numDeriv_2016.8-1 tools_3.5.1 [10] backports_1.1.2 utf8_1.1.4 R6_2.3.0 [13] rpart_4.1-13 Hmisc_4.1-1 DBI_1.0.0 [16] lazyeval_0.2.1 colorspace_1.3-2 nnet_7.3-12 [19] withr_2.1.2 prettyunits_1.0.2 tidyselect_0.2.5 [22] gridExtra_2.3 curl_3.2 bit_1.1-14 [25] compiler_3.5.1 cli_1.0.1 rvest_0.3.2 [28] htmlTable_1.12 xml2_1.2.0 scales_1.0.0 [31] checkmate_1.8.5 genefilter_1.62.0 digest_0.6.18 [34] foreign_0.8-71 XVector_0.20.0 base64enc_0.1-3 [37] pkgconfig_2.0.2 htmltools_0.3.6 bbmle_1.0.20 [40] readxl_1.1.0 htmlwidgets_1.3 rlang_0.2.2 [43] rstudioapi_0.8 RSQLite_2.1.1 bindr_0.1.1 [46] jsonlite_1.5 acepack_1.4.1 RCurl_1.95-4.11 [49] magrittr_1.5 GenomeInfoDbData_1.1.0 Formula_1.2-3 [52] Matrix_1.2-14 fansi_0.4.0 Rcpp_0.12.19 [55] munsell_0.5.0 stringi_1.2.4 MASS_7.3-51 [58] zlibbioc_1.26.0 plyr_1.8.4 grid_3.5.1 [61] blob_1.1.1 crayon_1.3.4 lattice_0.20-35 [64] haven_1.1.2 splines_3.5.1 annotate_1.58.0 [67] hms_0.4.2 locfit_1.5-9.1 knitr_1.20 [70] pillar_1.3.0 geneplotter_1.58.0 XML_3.98-1.16 [73] glue_1.3.0 latticeExtra_0.6-28 modelr_0.1.2 [76] data.table_1.11.8 cellranger_1.1.0 gtable_0.2.0 [79] assertthat_0.2.0 emdbook_1.3.10 xtable_1.8-3 [82] broom_0.5.0 coda_0.19-2 survival_2.42-6 [85] AnnotationDbi_1.42.1 memoise_1.1.0 cluster_2.0.7-1
Hi Michael,
I RTFM'd again and came across the section you link, had remembered this but thought I would not be able to use as I do not have
Normal
andTumour
in allIndividual
. The edit I made above runs without error, which I did not think would be the case. However, this is not accounting for variation due toIndividual
, or else there are some batch effects (although I am assured by those in lab that there are none...) based on PCA plots.I will look into limma-voom method you mention. Thanks for your time and continued support of DESeq2,
Bruce.