Question: DESeq2 Formula Design with two variables and individual normalization
gravatar for polyptwo
9 weeks ago by
polyptwo0 wrote:

Formula design

We are struggling in understanding the correct design formula for my rnaseq analysis. We have 16 samples, 4 biological replicates with 2 different variables: if they are treated with a drug or not, and if they are stressed or not.

We want to know: a) Which is the effect of the treatment under basal conditions. b) Which is the effect of the treatment under stressed conditions.

During the exploratory analysis, we realized that the individual differences between biological replicates were bigger than than the ones due to treatment, or infection (i.e. the samples cluster together by biological replicate and not by treatment). We are not interested in quantifying these differences, but on normalizing for them (kinda like a paired analysis?). My question is the following:

Given that the 'genotype' (or 'family') effect will be inherent in all the samples (see colData below), should I take it into account when doing the design matrix?

i.e., my colData would look like this:

   genotype treatment    stress
1         A untreated  stressed
2         A   treated  stressed
3         A untreated        No
4         A   treated        No
5         B untreated  stressed
6         B   treated  stressed
7         B untreated        No
8         B   treated        No
9         C untreated  stressed
10        C   treated  stressed
11        C untreated        No
12        C   treated        No
13        D untreated  stressed
14        D   treated  stressed
15        D untreated        No
16        D   treated        No

If I'm understanding previous answers right, I should normalize by 'genotype', and the technical process would be similar of dealing with having a 'batch' effect, being every genotype a 'batch'. In this case, should our design look like this?

design = ~ treatment + treatment:genotype + treatment:stress

(This would be similar to this case:

Or should I collapse 'treatment' and 'stress' into a grouped 'condition' variable as suggested here (, and input the design:

design = ~ genotype + condition

I would then take out the results that we are interested with

## Which is the effect of the treatment under basal conditions.
results(dds, contrast=c("condition","treatedNo", "untreatedNo")) 

## Which is the effect of the treatment under stressed conditions.
results(dds, contrast=c("condition","treatedstressed", "untreatedstressed"))

But I am not able to see how I would see the "extra" effect of the treatment under stressed conditions 'relative' to being stressed in this case.

Thank you so much for your help

deseq2 • 175 views
ADD COMMENTlink modified 9 weeks ago by Michael Love22k • written 9 weeks ago by polyptwo0
Answer: DESeq2 Formula Design with two variables and individual normalization
gravatar for Michael Love
9 weeks ago by
Michael Love22k
United States
Michael Love22k wrote:

If you additionally want to see the "extra" effect, that is often called an interaction. You can use the design ~genotype + treatment + treatment:stress, where you will have two stress effects which you can pull out individually with name or contrast with contrast=list(..., ...).

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by Michael Love22k

Thanks for your response Michael.

However, the design ~genotype + stress:treatment is not possible as the model matrix is not full rank . I guess that I am not fully understanding the linearity here because with my colData (in the original post) I cannot relate to any of the examples on the DESeq2 vignette. There is one copy of each 'genotype' in each group, and they are not nested within groups.

The designs ~genotype + stress + treatment + stress:treatment, or ~genotype + treatment + stress:treatment work, though. Maybe one of those was the one that you meant?

Thank you so much for your help, and have a nice weekend,

PS: I am running

ddsTxi <- DESeqDataSetFromTximport(txi,
                                    colData = coldata,
                                    design = ~ genotype + stress:treatment)

and my sessionInfo() is:

R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
[5] LC_TIME=Spanish_Spain.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tximport_1.10.1             ggplot2_3.1.0               gplots_3.0.1               
 [4] RColorBrewer_1.1-2          DESeq2_1.22.2               EDASeq_2.16.2              
 [7] ShortRead_1.40.0            GenomicAlignments_1.18.1    SummarizedExperiment_1.12.0
[10] DelayedArray_0.8.0          matrixStats_0.54.0          Rsamtools_1.34.0           
[13] GenomicRanges_1.34.0        GenomeInfoDb_1.18.1         Biostrings_2.50.2          
[16] XVector_0.22.0              BiocParallel_1.16.5         geneplotter_1.60.0         
[19] annotate_1.60.0             XML_3.98-1.16               AnnotationDbi_1.44.0       
[22] IRanges_2.16.0              S4Vectors_0.20.1            lattice_0.20-38            
[25] Biobase_2.42.0              BiocGenerics_0.28.0         biomaRt_2.38.0             

loaded via a namespace (and not attached):
 [1] bitops_1.0-6           bit64_0.9-7            progress_1.2.0        
 [4] httr_1.4.0             tools_3.5.2            backports_1.1.3       
 [7] R6_2.3.0               KernSmooth_2.23-15     rpart_4.1-13          
[10] Hmisc_4.1-1            DBI_1.0.0              lazyeval_0.2.1        
[13] colorspace_1.3-2       nnet_7.3-12            withr_2.1.2           
[16] tidyselect_0.2.5       gridExtra_2.3          prettyunits_1.0.2     
[19] bit_1.1-14             compiler_3.5.2         htmlTable_1.13.1      
[22] labeling_0.3           rtracklayer_1.42.1     caTools_1.17.1.1      
[25] scales_1.0.0           checkmate_1.9.0        readr_1.3.1           
[28] genefilter_1.64.0      DESeq_1.34.1           stringr_1.3.1         
[31] digest_0.6.18          foreign_0.8-71         R.utils_2.7.0         
[34] base64enc_0.1-3        pkgconfig_2.0.2        htmltools_0.3.6       
[37] htmlwidgets_1.3        rlang_0.3.1            rstudioapi_0.9.0      
[40] RSQLite_2.1.1          bindr_0.1.1            jsonlite_1.6          
[43] hwriter_1.3.2          gtools_3.8.1           acepack_1.4.1         
[46] dplyr_0.7.8            R.oo_1.22.0            RCurl_1.95-4.11       
[49] magrittr_1.5           GenomeInfoDbData_1.2.0 Formula_1.2-3         
[52] Matrix_1.2-15          Rcpp_1.0.0             munsell_0.5.0         
[55] R.methodsS3_1.7.1      stringi_1.2.4          yaml_2.2.0            
[58] zlibbioc_1.28.0        plyr_1.8.4             grid_3.5.2            
[61] blob_1.1.1             gdata_2.18.0           crayon_1.3.4          
[64] splines_3.5.2          GenomicFeatures_1.34.1 hms_0.4.2             
[67] locfit_1.5-9.1         knitr_1.21             pillar_1.3.1          
[70] glue_1.3.0             latticeExtra_0.6-28    data.table_1.11.8     
[73] BiocManager_1.30.4     gtable_0.2.0           purrr_0.2.5           
[76] assertthat_0.2.0       xfun_0.4               aroma.light_3.12.0    
[79] xtable_1.8-3           survival_2.43-3        tibble_2.0.0          
[82] memoise_1.1.0          bindrcpp_0.2.2         cluster_2.0.7-1
ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by polyptwo0

Apologies, I had a typo above and left out the base level.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Michael Love22k

Thanks for your help!

ADD REPLYlink written 9 weeks ago by polyptwo0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 240 users visited in the last hour