Hello,
I've been trying to analyse a set of 18 samples using DESeq2 and I am unsure about the design. Here is the sample information:
cell.line disease
L1_A L1 Yes
L1_B L1 Yes
L1_C L1 Yes
L1_D L1 No
L1_E L1 No
L1_F L1 No
L2_A L2 Yes
L2_B L2 Yes
L2_C L2 Yes
L2_D L2 No
L2_E L2 No
L2_F L2 No
L3_A L3 Yes
L3_B L3 Yes
L3_C L3 Yes
L3_D L3 No
L3_E L3 No
L3_F L3 No
My aim is to identify differentially expressed genes between healthy and disease samples for each one of the cell lines.
As I am not interested here in the effect of disease or cell line across all samples I assumed my design should have an interaction term such as cell.line:disease
, however, I am unsure about whether or not I should also add cell.line+disease
to my design so that it makes sense to extract the comparisons I am interested in? I have also considered splitting the analysis in different DESeq objects, one for each cell line, but I was worried I might run into multiple comparison issues.
I have been through many of the available posts related to this but found it difficult to determine whether or not the same strategies would apply here.
Pointers/advice would be greatly appreciated.
Thank you.
sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] DESeq2_1.20.0 SummarizedExperiment_1.10.1 DelayedArray_0.6.6 BiocParallel_1.14.2
[5] matrixStats_0.54.0 Biobase_2.40.0 GenomicRanges_1.32.7 GenomeInfoDb_1.16.0
[9] IRanges_2.14.12 S4Vectors_0.18.3 BiocGenerics_0.26.0
loaded via a namespace (and not attached):
[1] bit64_0.9-7 splines_3.5.0 Formula_1.2-3 assertthat_0.2.0 latticeExtra_0.6-28
[6] blob_1.1.1 GenomeInfoDbData_1.1.0 yaml_2.2.0 pillar_1.3.0 RSQLite_2.1.1
[11] backports_1.1.2 lattice_0.20-38 glue_1.3.0 digest_0.6.18 RColorBrewer_1.1-2
[16] XVector_0.20.0 checkmate_1.8.5 colorspace_1.3-2 htmltools_0.3.6 Matrix_1.2-15
[21] plyr_1.8.4 XML_3.98-1.16 pkgconfig_2.0.2 genefilter_1.62.0 zlibbioc_1.26.0
[26] purrr_0.2.5 xtable_1.8-3 scales_1.0.0 htmlTable_1.12 tibble_1.4.2
[31] annotate_1.58.0 ggplot2_3.1.0 nnet_7.3-12 lazyeval_0.2.1 survival_2.43-3
[36] magrittr_1.5 crayon_1.3.4 memoise_1.1.0 foreign_0.8-71 tools_3.5.0
[41] data.table_1.11.8 stringr_1.3.1 locfit_1.5-9.1 munsell_0.5.0 cluster_2.0.7-1
[46] AnnotationDbi_1.42.1 bindrcpp_0.2.2 compiler_3.5.0 rlang_0.3.0.1 grid_3.5.0
[51] RCurl_1.95-4.11 rstudioapi_0.8 htmlwidgets_1.3 bitops_1.0-6 base64enc_0.1-3
[56] gtable_0.2.0 DBI_1.0.0 R6_2.3.0 gridExtra_2.3 knitr_1.20
[61] dplyr_0.7.8 bit_1.1-14 bindr_0.1.1 Hmisc_4.1-1 stringi_1.2.4
[66] Rcpp_1.0.0 geneplotter_1.58.0 rpart_4.1-13 acepack_1.4.1 tidyselect_0.2.5
Ah, I had somehow assumed that this approach would result in a sensitivity loss because everything was analysed together and that I would need to drop one of the terms in the design to prevent that. I now see how combining the factors could solve that. Thank you for making the parallel, it seems obvious now...