Question: DESeq2 Design with Multiple Replicates of Individuals including Tumour and Normal
gravatar for bruce.moran
10 months ago by
bruce.moran20 wrote:


I have a bit of experience with simpler model designs in DESeq2. This is a slightly more complex design than usual so wanted to check how I might run the analysis.

I have 35 samples from rats fed normal chow (NC) or high fat diet (HFD). These animals are bred to be susceptible to tumours. They were sacrificed, then tumours and normal tissue from the same organ as tumour were removed. So the metadata I have is:

   sampleID Tissue Individual Diet
1        S1 Normal          1   NC
2        S2 Tumour          1   NC
3        S3 Tumour          1   NC
4        S4 Tumour          1   NC
5        S5 Normal          2   NC
6        S6 Normal          3   NC
7        S7 Tumour          4   NC
8        S8 Tumour          4   NC
9        S9 Tumour          5   NC
10      S10 Tumour          5   NC
11      S11 Tumour          5   NC
12      S12 Normal          5   NC
13      S13 Tumour          6   NC
14      S14 Tumour          6   NC
15      S15 Tumour          6   NC
16      S16 Tumour          6   NC
17      S17 Tumour          7   NC
18      S18 Tumour          7   NC
19      S19 Tumour          7   NC
20      S20 Normal          7   NC
21      S21 Normal          8  HFD
22      S22 Normal         10  HFD
23      S23 Tumour         11  HFD
24      S24 Tumour         11  HFD
25      S25 Tumour         11  HFD
26      S26 Tumour         12  HFD
27      S27 Tumour         12  HFD
28      S28 Tumour         12  HFD
29      S29 Tumour         12  HFD
30      S30 Normal         13  HFD
31      S31 Tumour         21  HFD
32      S32 Tumour         22  HFD
33      S33 Tumour         22  HFD
34      S34 Tumour         22  HFD
35      S35 Normal         22  HFD

As you can see, the first individual has a single normal sample, and 3 tumours. These relate to the samples taken at 'surgery', so not just technical replicates of same library sequenced multiple times (hence I believe I should not use collapseReplicates). As you can also see, not every individual has a tumour and normal sample.

The design I have tried is:

dds <- DESeqDataSetFromMatrix(,
                              colData = conds,
                              design =~ Tissue + Diet + Tissue:Diet)

where is raw counts with any genes with rowSums==0 removed. I feel I should use Individual in the design but get full rank error (I believe) because of the above issue that some individuals are tumour or normal only.

Any ideas on what to do here would be of great help and much appreciated.




### Update ###


As per Michael's answer below:

dds <- DESeqDataSetFromMatrix(,
                              colData = conds,
                              design =~ Tissue + Diet)
dds$Tissue <- relevel(dds$Tissue, ref="Normal")
dds$IndividualN <- factor(c(1,1,1,1,2,3,4,4,5,5,5,5,6,6,6,6,7,7,7,7,1,2,3,3,3,4,4,4,4,5,6,7,7,7,7))
dds <- DESeq(dds)
res <- results(dds,contrast=list("DietNC.Tumour","DietNC.Normal"))



R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /apps/builddir/R-3.5.1/lib/
LAPACK: /apps/builddir/R-3.5.1/lib/

 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] bindrcpp_0.2.2              biomaRt_2.36.1
 [3] forcats_0.3.0               stringr_1.3.1
 [5] dplyr_0.7.7                 purrr_0.2.5
 [7] readr_1.1.1                 tidyr_0.8.1
 [9] tibble_1.4.2                tidyverse_1.2.1
[11] ggplot2_3.0.0               apeglm_1.2.1
[13] DESeq2_1.20.0               SummarizedExperiment_1.10.1
[15] DelayedArray_0.6.6          BiocParallel_1.14.2
[17] matrixStats_0.54.0          Biobase_2.40.0
[19] GenomicRanges_1.32.7        GenomeInfoDb_1.16.0
[21] IRanges_2.14.12             S4Vectors_0.18.3
[23] BiocGenerics_0.26.0

loaded via a namespace (and not attached):
 [1] nlme_3.1-137           bitops_1.0-6           lubridate_1.7.4
 [4] bit64_0.9-7            progress_1.2.0         RColorBrewer_1.1-2
 [7] httr_1.3.1             numDeriv_2016.8-1      tools_3.5.1
[10] backports_1.1.2        utf8_1.1.4             R6_2.3.0
[13] rpart_4.1-13           Hmisc_4.1-1            DBI_1.0.0
[16] lazyeval_0.2.1         colorspace_1.3-2       nnet_7.3-12
[19] withr_2.1.2            prettyunits_1.0.2      tidyselect_0.2.5
[22] gridExtra_2.3          curl_3.2               bit_1.1-14
[25] compiler_3.5.1         cli_1.0.1              rvest_0.3.2
[28] htmlTable_1.12         xml2_1.2.0             scales_1.0.0
[31] checkmate_1.8.5        genefilter_1.62.0      digest_0.6.18
[34] foreign_0.8-71         XVector_0.20.0         base64enc_0.1-3
[37] pkgconfig_2.0.2        htmltools_0.3.6        bbmle_1.0.20
[40] readxl_1.1.0           htmlwidgets_1.3        rlang_0.2.2
[43] rstudioapi_0.8         RSQLite_2.1.1          bindr_0.1.1
[46] jsonlite_1.5           acepack_1.4.1          RCurl_1.95-4.11
[49] magrittr_1.5           GenomeInfoDbData_1.1.0 Formula_1.2-3
[52] Matrix_1.2-14          fansi_0.4.0            Rcpp_0.12.19
[55] munsell_0.5.0          stringi_1.2.4          MASS_7.3-51
[58] zlibbioc_1.26.0        plyr_1.8.4             grid_3.5.1
[61] blob_1.1.1             crayon_1.3.4           lattice_0.20-35
[64] haven_1.1.2            splines_3.5.1          annotate_1.58.0
[67] hms_0.4.2              locfit_1.5-9.1         knitr_1.20
[70] pillar_1.3.0           geneplotter_1.58.0     XML_3.98-1.16
[73] glue_1.3.0             latticeExtra_0.6-28    modelr_0.1.2
[76] data.table_1.11.8      cellranger_1.1.0       gtable_0.2.0
[79] assertthat_0.2.0       emdbook_1.3.10         xtable_1.8-3
[82] broom_0.5.0            coda_0.19-2            survival_2.42-6
[85] AnnotationDbi_1.42.1   memoise_1.1.0          cluster_2.0.7-1


ADD COMMENTlink modified 10 months ago • written 10 months ago by bruce.moran20
Answer: DESeq2 Design with Multiple Replicates of Individuals including Tumour and Norma
gravatar for Michael Love
10 months ago by
Michael Love25k
United States
Michael Love25k wrote:

hi Bruce,

We have a section of the vignette that addresses what to do here in the case that all samples have both levels of "condition", in your case "tissue". I put the link below. However, this approach cannot deal with some samples having on one or the other level of tissue (only normal or only tumor), then a mix of some individuals with both levels. As far as I know the only approach you can take there is to use limma-voom with duplicateCorrelation() to try to account for some amount of correlation across some subset of the samples, due to individual.

ADD COMMENTlink written 10 months ago by Michael Love25k

Hi Michael,

I RTFM'd again and came across the section you link, had remembered this but thought I would not be able to use as I do not have Normal and Tumour in all Individual. The edit I made above runs without error, which I did not think would be the case. However, this is not accounting for variation due to Individual, or else there are some batch effects (although I am assured by those in lab that there are none...) based on PCA plots.

I will look into limma-voom method you mention. Thanks for your time and continued support of DESeq2,



ADD REPLYlink modified 10 months ago • written 10 months ago by bruce.moran20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 318 users visited in the last hour