DESeq2 Formula Design with two variables and individual normalization
1
0
Entering edit mode
polyptwo • 0
@polyptwo-19338
Last seen 4.4 years ago

Formula design

We are struggling in understanding the correct design formula for my rnaseq analysis. We have 16 samples, 4 biological replicates with 2 different variables: if they are treated with a drug or not, and if they are stressed or not.

We want to know: a) Which is the effect of the treatment under basal conditions. b) Which is the effect of the treatment under stressed conditions.

During the exploratory analysis, we realized that the individual differences between biological replicates were bigger than than the ones due to treatment, or infection (i.e. the samples cluster together by biological replicate and not by treatment). We are not interested in quantifying these differences, but on normalizing for them (kinda like a paired analysis?). My question is the following:

Given that the 'genotype' (or 'family') effect will be inherent in all the samples (see colData below), should I take it into account when doing the design matrix?

i.e., my colData would look like this:

   genotype treatment    stress
1         A untreated  stressed
2         A   treated  stressed
3         A untreated        No
4         A   treated        No
5         B untreated  stressed
6         B   treated  stressed
7         B untreated        No
8         B   treated        No
9         C untreated  stressed
10        C   treated  stressed
11        C untreated        No
12        C   treated        No
13        D untreated  stressed
14        D   treated  stressed
15        D untreated        No
16        D   treated        No

If I'm understanding previous answers right, I should normalize by 'genotype', and the technical process would be similar of dealing with having a 'batch' effect, being every genotype a 'batch'. In this case, should our design look like this?

design = ~ treatment + treatment:genotype + treatment:stress

(This would be similar to this case: https://support.bioconductor.org/p/92108/)

Or should I collapse 'treatment' and 'stress' into a grouped 'condition' variable as suggested here (https://support.bioconductor.org/p/67600/#67612), and input the design:

design = ~ genotype + condition

I would then take out the results that we are interested with

## Which is the effect of the treatment under basal conditions.
results(dds, contrast=c("condition","treatedNo", "untreatedNo")) 

## Which is the effect of the treatment under stressed conditions.
results(dds, contrast=c("condition","treatedstressed", "untreatedstressed"))

But I am not able to see how I would see the "extra" effect of the treatment under stressed conditions 'relative' to being stressed in this case.

Thank you so much for your help

deseq2 • 4.6k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 9 hours ago
United States

If you additionally want to see the "extra" effect, that is often called an interaction. You can use the design ~genotype + treatment + treatment:stress, where you will have two stress effects which you can pull out individually with name or contrast with contrast=list(..., ...).

ADD COMMENT
0
Entering edit mode

Thanks for your response Michael.

However, the design ~genotype + stress:treatment is not possible as the model matrix is not full rank . I guess that I am not fully understanding the linearity here because with my colData (in the original post) I cannot relate to any of the examples on the DESeq2 vignette. There is one copy of each 'genotype' in each group, and they are not nested within groups.

The designs ~genotype + stress + treatment + stress:treatment, or ~genotype + treatment + stress:treatment work, though. Maybe one of those was the one that you meant?

Thank you so much for your help, and have a nice weekend,

PS: I am running

ddsTxi <- DESeqDataSetFromTximport(txi,
                                    colData = coldata,
                                    design = ~ genotype + stress:treatment)

and my sessionInfo() is:

R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
[5] LC_TIME=Spanish_Spain.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tximport_1.10.1             ggplot2_3.1.0               gplots_3.0.1               
 [4] RColorBrewer_1.1-2          DESeq2_1.22.2               EDASeq_2.16.2              
 [7] ShortRead_1.40.0            GenomicAlignments_1.18.1    SummarizedExperiment_1.12.0
[10] DelayedArray_0.8.0          matrixStats_0.54.0          Rsamtools_1.34.0           
[13] GenomicRanges_1.34.0        GenomeInfoDb_1.18.1         Biostrings_2.50.2          
[16] XVector_0.22.0              BiocParallel_1.16.5         geneplotter_1.60.0         
[19] annotate_1.60.0             XML_3.98-1.16               AnnotationDbi_1.44.0       
[22] IRanges_2.16.0              S4Vectors_0.20.1            lattice_0.20-38            
[25] Biobase_2.42.0              BiocGenerics_0.28.0         biomaRt_2.38.0             

loaded via a namespace (and not attached):
 [1] bitops_1.0-6           bit64_0.9-7            progress_1.2.0        
 [4] httr_1.4.0             tools_3.5.2            backports_1.1.3       
 [7] R6_2.3.0               KernSmooth_2.23-15     rpart_4.1-13          
[10] Hmisc_4.1-1            DBI_1.0.0              lazyeval_0.2.1        
[13] colorspace_1.3-2       nnet_7.3-12            withr_2.1.2           
[16] tidyselect_0.2.5       gridExtra_2.3          prettyunits_1.0.2     
[19] bit_1.1-14             compiler_3.5.2         htmlTable_1.13.1      
[22] labeling_0.3           rtracklayer_1.42.1     caTools_1.17.1.1      
[25] scales_1.0.0           checkmate_1.9.0        readr_1.3.1           
[28] genefilter_1.64.0      DESeq_1.34.1           stringr_1.3.1         
[31] digest_0.6.18          foreign_0.8-71         R.utils_2.7.0         
[34] base64enc_0.1-3        pkgconfig_2.0.2        htmltools_0.3.6       
[37] htmlwidgets_1.3        rlang_0.3.1            rstudioapi_0.9.0      
[40] RSQLite_2.1.1          bindr_0.1.1            jsonlite_1.6          
[43] hwriter_1.3.2          gtools_3.8.1           acepack_1.4.1         
[46] dplyr_0.7.8            R.oo_1.22.0            RCurl_1.95-4.11       
[49] magrittr_1.5           GenomeInfoDbData_1.2.0 Formula_1.2-3         
[52] Matrix_1.2-15          Rcpp_1.0.0             munsell_0.5.0         
[55] R.methodsS3_1.7.1      stringi_1.2.4          yaml_2.2.0            
[58] zlibbioc_1.28.0        plyr_1.8.4             grid_3.5.2            
[61] blob_1.1.1             gdata_2.18.0           crayon_1.3.4          
[64] splines_3.5.2          GenomicFeatures_1.34.1 hms_0.4.2             
[67] locfit_1.5-9.1         knitr_1.21             pillar_1.3.1          
[70] glue_1.3.0             latticeExtra_0.6-28    data.table_1.11.8     
[73] BiocManager_1.30.4     gtable_0.2.0           purrr_0.2.5           
[76] assertthat_0.2.0       xfun_0.4               aroma.light_3.12.0    
[79] xtable_1.8-3           survival_2.43-3        tibble_2.0.0          
[82] memoise_1.1.0          bindrcpp_0.2.2         cluster_2.0.7-1
ADD REPLY
1
Entering edit mode

Apologies, I had a typo above and left out the base level.

ADD REPLY
0
Entering edit mode

Thanks for your help!

ADD REPLY

Login before adding your answer.

Traffic: 920 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6