Question

DESeq2 Design with Multiple Replicates of Individuals including Tumour and Normal

0

Entering edit mode

bruce.moran ▴ 30

@brucemoran-8388

Last seen 4.1 years ago

Ireland

Hi,

I have a bit of experience with simpler model designs in DESeq2. This is a slightly more complex design than usual so wanted to check how I might run the analysis.

I have 35 samples from rats fed normal chow (NC) or high fat diet (HFD). These animals are bred to be susceptible to tumours. They were sacrificed, then tumours and normal tissue from the same organ as tumour were removed. So the metadata I have is:

>conds
   sampleID Tissue Individual Diet
1        S1 Normal          1   NC
2        S2 Tumour          1   NC
3        S3 Tumour          1   NC
4        S4 Tumour          1   NC
5        S5 Normal          2   NC
6        S6 Normal          3   NC
7        S7 Tumour          4   NC
8        S8 Tumour          4   NC
9        S9 Tumour          5   NC
10      S10 Tumour          5   NC
11      S11 Tumour          5   NC
12      S12 Normal          5   NC
13      S13 Tumour          6   NC
14      S14 Tumour          6   NC
15      S15 Tumour          6   NC
16      S16 Tumour          6   NC
17      S17 Tumour          7   NC
18      S18 Tumour          7   NC
19      S19 Tumour          7   NC
20      S20 Normal          7   NC
21      S21 Normal          8  HFD
22      S22 Normal         10  HFD
23      S23 Tumour         11  HFD
24      S24 Tumour         11  HFD
25      S25 Tumour         11  HFD
26      S26 Tumour         12  HFD
27      S27 Tumour         12  HFD
28      S28 Tumour         12  HFD
29      S29 Tumour         12  HFD
30      S30 Normal         13  HFD
31      S31 Tumour         21  HFD
32      S32 Tumour         22  HFD
33      S33 Tumour         22  HFD
34      S34 Tumour         22  HFD
35      S35 Normal         22  HFD

As you can see, the first individual has a single normal sample, and 3 tumours. These relate to the samples taken at 'surgery', so not just technical replicates of same library sequenced multiple times (hence I believe I should not use collapseReplicates). As you can also see, not every individual has a tumour and normal sample.

The design I have tried is:

dds <- DESeqDataSetFromMatrix(nz.co,
                              colData = conds,
                              design =~ Tissue + Diet + Tissue:Diet)

where nz.co is raw counts with any genes with rowSums==0 removed. I feel I should use Individual in the design but get full rank error (I believe) because of the above issue that some individuals are tumour or normal only.

Any ideas on what to do here would be of great help and much appreciated.

Bruce.

##############

### Update ###

###########

As per Michael's answer below:

dds <- DESeqDataSetFromMatrix(nz.co,
                              colData = conds,
                              design =~ Tissue + Diet)
dds$Tissue <- relevel(dds$Tissue, ref="Normal")
dds$IndividualN <- factor(c(1,1,1,1,2,3,4,4,5,5,5,5,6,6,6,6,7,7,7,7,1,2,3,3,3,4,4,4,4,5,6,7,7,7,7))
dds <- DESeq(dds)
res <- results(dds,contrast=list("DietNC.Tumour","DietNC.Normal"))

###sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /apps/builddir/R-3.5.1/lib/libRblas.so
LAPACK: /apps/builddir/R-3.5.1/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] bindrcpp_0.2.2              biomaRt_2.36.1
 [3] forcats_0.3.0               stringr_1.3.1
 [5] dplyr_0.7.7                 purrr_0.2.5
 [7] readr_1.1.1                 tidyr_0.8.1
 [9] tibble_1.4.2                tidyverse_1.2.1
[11] ggplot2_3.0.0               apeglm_1.2.1
[13] DESeq2_1.20.0               SummarizedExperiment_1.10.1
[15] DelayedArray_0.6.6          BiocParallel_1.14.2
[17] matrixStats_0.54.0          Biobase_2.40.0
[19] GenomicRanges_1.32.7        GenomeInfoDb_1.16.0
[21] IRanges_2.14.12             S4Vectors_0.18.3
[23] BiocGenerics_0.26.0

loaded via a namespace (and not attached):
 [1] nlme_3.1-137           bitops_1.0-6           lubridate_1.7.4
 [4] bit64_0.9-7            progress_1.2.0         RColorBrewer_1.1-2
 [7] httr_1.3.1             numDeriv_2016.8-1      tools_3.5.1
[10] backports_1.1.2        utf8_1.1.4             R6_2.3.0
[13] rpart_4.1-13           Hmisc_4.1-1            DBI_1.0.0
[16] lazyeval_0.2.1         colorspace_1.3-2       nnet_7.3-12
[19] withr_2.1.2            prettyunits_1.0.2      tidyselect_0.2.5
[22] gridExtra_2.3          curl_3.2               bit_1.1-14
[25] compiler_3.5.1         cli_1.0.1              rvest_0.3.2
[28] htmlTable_1.12         xml2_1.2.0             scales_1.0.0
[31] checkmate_1.8.5        genefilter_1.62.0      digest_0.6.18
[34] foreign_0.8-71         XVector_0.20.0         base64enc_0.1-3
[37] pkgconfig_2.0.2        htmltools_0.3.6        bbmle_1.0.20
[40] readxl_1.1.0           htmlwidgets_1.3        rlang_0.2.2
[43] rstudioapi_0.8         RSQLite_2.1.1          bindr_0.1.1
[46] jsonlite_1.5           acepack_1.4.1          RCurl_1.95-4.11
[49] magrittr_1.5           GenomeInfoDbData_1.1.0 Formula_1.2-3
[52] Matrix_1.2-14          fansi_0.4.0            Rcpp_0.12.19
[55] munsell_0.5.0          stringi_1.2.4          MASS_7.3-51
[58] zlibbioc_1.26.0        plyr_1.8.4             grid_3.5.1
[61] blob_1.1.1             crayon_1.3.4           lattice_0.20-35
[64] haven_1.1.2            splines_3.5.1          annotate_1.58.0
[67] hms_0.4.2              locfit_1.5-9.1         knitr_1.20
[70] pillar_1.3.0           geneplotter_1.58.0     XML_3.98-1.16
[73] glue_1.3.0             latticeExtra_0.6-28    modelr_0.1.2
[76] data.table_1.11.8      cellranger_1.1.0       gtable_0.2.0
[79] assertthat_0.2.0       emdbook_1.3.10         xtable_1.8-3
[82] broom_0.5.0            coda_0.19-2            survival_2.42-6
[85] AnnotationDbi_1.42.1   memoise_1.1.0          cluster_2.0.7-1

DESeq2 design and contrast matrix • 1.8k views

ADD COMMENT • link 7.1 years ago bruce.moran ▴ 30

score 0 · Answer 1 · 2018-11-26

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 hours ago

United States

hi Bruce,

We have a section of the vignette that addresses what to do here in the case that all samples have both levels of "condition", in your case "tissue". I put the link below. However, this approach cannot deal with some samples having on one or the other level of tissue (only normal or only tumor), then a mix of some individuals with both levels. As far as I know the only approach you can take there is to use limma-voom with duplicateCorrelation() to try to account for some amount of correlation across some subset of the samples, due to individual.

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups

ADD COMMENT • link 7.1 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

I RTFM'd again and came across the section you link, had remembered this but thought I would not be able to use as I do not have Normal and Tumour in all Individual. The edit I made above runs without error, which I did not think would be the case. However, this is not accounting for variation due to Individual, or else there are some batch effects (although I am assured by those in lab that there are none...) based on PCA plots.

I will look into limma-voom method you mention. Thanks for your time and continued support of DESeq2,

Bruce.

ADD REPLY • link 7.1 years ago bruce.moran ▴ 30