Question: DESeq2 Formula Design with two variables and individual normalization
0
9 months ago by
polyptwo0
polyptwo0 wrote:

Formula design

We are struggling in understanding the correct design formula for my rnaseq analysis. We have 16 samples, 4 biological replicates with 2 different variables: if they are treated with a drug or not, and if they are stressed or not.

We want to know: a) Which is the effect of the treatment under basal conditions. b) Which is the effect of the treatment under stressed conditions.

During the exploratory analysis, we realized that the individual differences between biological replicates were bigger than than the ones due to treatment, or infection (i.e. the samples cluster together by biological replicate and not by treatment). We are not interested in quantifying these differences, but on normalizing for them (kinda like a paired analysis?). My question is the following:

Given that the 'genotype' (or 'family') effect will be inherent in all the samples (see colData below), should I take it into account when doing the design matrix?

i.e., my colData would look like this:

   genotype treatment    stress
1         A untreated  stressed
2         A   treated  stressed
3         A untreated        No
4         A   treated        No
5         B untreated  stressed
6         B   treated  stressed
7         B untreated        No
8         B   treated        No
9         C untreated  stressed
10        C   treated  stressed
11        C untreated        No
12        C   treated        No
13        D untreated  stressed
14        D   treated  stressed
15        D untreated        No
16        D   treated        No


If I'm understanding previous answers right, I should normalize by 'genotype', and the technical process would be similar of dealing with having a 'batch' effect, being every genotype a 'batch'. In this case, should our design look like this?

design = ~ treatment + treatment:genotype + treatment:stress


(This would be similar to this case: https://support.bioconductor.org/p/92108/)

Or should I collapse 'treatment' and 'stress' into a grouped 'condition' variable as suggested here (https://support.bioconductor.org/p/67600/#67612), and input the design:

design = ~ genotype + condition


I would then take out the results that we are interested with

## Which is the effect of the treatment under basal conditions.
results(dds, contrast=c("condition","treatedNo", "untreatedNo"))

## Which is the effect of the treatment under stressed conditions.
results(dds, contrast=c("condition","treatedstressed", "untreatedstressed"))


But I am not able to see how I would see the "extra" effect of the treatment under stressed conditions 'relative' to being stressed in this case.

Thank you so much for your help

deseq2 • 477 views
modified 9 months ago by Michael Love25k • written 9 months ago by polyptwo0
Answer: DESeq2 Formula Design with two variables and individual normalization
2
9 months ago by
Michael Love25k
United States
Michael Love25k wrote:

If you additionally want to see the "extra" effect, that is often called an interaction. You can use the design ~genotype + treatment + treatment:stress, where you will have two stress effects which you can pull out individually with name or contrast with contrast=list(..., ...).

However, the design ~genotype + stress:treatment is not possible as the model matrix is not full rank . I guess that I am not fully understanding the linearity here because with my colData (in the original post) I cannot relate to any of the examples on the DESeq2 vignette. There is one copy of each 'genotype' in each group, and they are not nested within groups.

The designs ~genotype + stress + treatment + stress:treatment, or ~genotype + treatment + stress:treatment work, though. Maybe one of those was the one that you meant?

Thank you so much for your help, and have a nice weekend,

PS: I am running

ddsTxi <- DESeqDataSetFromTximport(txi,
colData = coldata,
design = ~ genotype + stress:treatment)


and my sessionInfo() is:

R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
[5] LC_TIME=Spanish_Spain.1252

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] tximport_1.10.1             ggplot2_3.1.0               gplots_3.0.1
[4] RColorBrewer_1.1-2          DESeq2_1.22.2               EDASeq_2.16.2
[10] DelayedArray_0.8.0          matrixStats_0.54.0          Rsamtools_1.34.0
[13] GenomicRanges_1.34.0        GenomeInfoDb_1.18.1         Biostrings_2.50.2
[16] XVector_0.22.0              BiocParallel_1.16.5         geneplotter_1.60.0
[19] annotate_1.60.0             XML_3.98-1.16               AnnotationDbi_1.44.0
[22] IRanges_2.16.0              S4Vectors_0.20.1            lattice_0.20-38
[25] Biobase_2.42.0              BiocGenerics_0.28.0         biomaRt_2.38.0

loaded via a namespace (and not attached):
[1] bitops_1.0-6           bit64_0.9-7            progress_1.2.0
[4] httr_1.4.0             tools_3.5.2            backports_1.1.3
[7] R6_2.3.0               KernSmooth_2.23-15     rpart_4.1-13
[10] Hmisc_4.1-1            DBI_1.0.0              lazyeval_0.2.1
[13] colorspace_1.3-2       nnet_7.3-12            withr_2.1.2
[16] tidyselect_0.2.5       gridExtra_2.3          prettyunits_1.0.2
[19] bit_1.1-14             compiler_3.5.2         htmlTable_1.13.1
[22] labeling_0.3           rtracklayer_1.42.1     caTools_1.17.1.1
[28] genefilter_1.64.0      DESeq_1.34.1           stringr_1.3.1
[31] digest_0.6.18          foreign_0.8-71         R.utils_2.7.0
[34] base64enc_0.1-3        pkgconfig_2.0.2        htmltools_0.3.6
[37] htmlwidgets_1.3        rlang_0.3.1            rstudioapi_0.9.0
[40] RSQLite_2.1.1          bindr_0.1.1            jsonlite_1.6
[43] hwriter_1.3.2          gtools_3.8.1           acepack_1.4.1
[46] dplyr_0.7.8            R.oo_1.22.0            RCurl_1.95-4.11
[49] magrittr_1.5           GenomeInfoDbData_1.2.0 Formula_1.2-3
[52] Matrix_1.2-15          Rcpp_1.0.0             munsell_0.5.0
[55] R.methodsS3_1.7.1      stringi_1.2.4          yaml_2.2.0
[58] zlibbioc_1.28.0        plyr_1.8.4             grid_3.5.2
[61] blob_1.1.1             gdata_2.18.0           crayon_1.3.4
[64] splines_3.5.2          GenomicFeatures_1.34.1 hms_0.4.2
[67] locfit_1.5-9.1         knitr_1.21             pillar_1.3.1
[70] glue_1.3.0             latticeExtra_0.6-28    data.table_1.11.8
[73] BiocManager_1.30.4     gtable_0.2.0           purrr_0.2.5
[76] assertthat_0.2.0       xfun_0.4               aroma.light_3.12.0
[79] xtable_1.8-3           survival_2.43-3        tibble_2.0.0
[82] memoise_1.1.0          bindrcpp_0.2.2         cluster_2.0.7-1

1

Apologies, I had a typo above and left out the base level.