Question

DESeq2 design with different cell lines and disease status

0

Entering edit mode

lu.ne • 0

@lune-19644

Last seen 4.1 years ago

Hello,

I've been trying to analyse a set of 18 samples using DESeq2 and I am unsure about the design. Here is the sample information:

       cell.line     disease
L1_A   L1            Yes
L1_B   L1            Yes
L1_C   L1            Yes
L1_D   L1            No
L1_E   L1            No
L1_F   L1            No
L2_A   L2            Yes
L2_B   L2            Yes
L2_C   L2            Yes
L2_D   L2            No
L2_E   L2            No
L2_F   L2            No
L3_A   L3            Yes
L3_B   L3            Yes
L3_C   L3            Yes
L3_D   L3            No
L3_E   L3            No
L3_F   L3            No

My aim is to identify differentially expressed genes between healthy and disease samples for each one of the cell lines. As I am not interested here in the effect of disease or cell line across all samples I assumed my design should have an interaction term such as cell.line:disease, however, I am unsure about whether or not I should also add cell.line+disease to my design so that it makes sense to extract the comparisons I am interested in? I have also considered splitting the analysis in different DESeq objects, one for each cell line, but I was worried I might run into multiple comparison issues. I have been through many of the available posts related to this but found it difficult to determine whether or not the same strategies would apply here. Pointers/advice would be greatly appreciated.

Thank you.

sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.20.0               SummarizedExperiment_1.10.1 DelayedArray_0.6.6          BiocParallel_1.14.2        
 [5] matrixStats_0.54.0          Biobase_2.40.0              GenomicRanges_1.32.7        GenomeInfoDb_1.16.0        
 [9] IRanges_2.14.12             S4Vectors_0.18.3            BiocGenerics_0.26.0        

loaded via a namespace (and not attached):
 [1] bit64_0.9-7            splines_3.5.0          Formula_1.2-3          assertthat_0.2.0       latticeExtra_0.6-28   
 [6] blob_1.1.1             GenomeInfoDbData_1.1.0 yaml_2.2.0             pillar_1.3.0           RSQLite_2.1.1         
[11] backports_1.1.2        lattice_0.20-38        glue_1.3.0             digest_0.6.18          RColorBrewer_1.1-2    
[16] XVector_0.20.0         checkmate_1.8.5        colorspace_1.3-2       htmltools_0.3.6        Matrix_1.2-15         
[21] plyr_1.8.4             XML_3.98-1.16          pkgconfig_2.0.2        genefilter_1.62.0      zlibbioc_1.26.0       
[26] purrr_0.2.5            xtable_1.8-3           scales_1.0.0           htmlTable_1.12         tibble_1.4.2          
[31] annotate_1.58.0        ggplot2_3.1.0          nnet_7.3-12            lazyeval_0.2.1         survival_2.43-3       
[36] magrittr_1.5           crayon_1.3.4           memoise_1.1.0          foreign_0.8-71         tools_3.5.0           
[41] data.table_1.11.8      stringr_1.3.1          locfit_1.5-9.1         munsell_0.5.0          cluster_2.0.7-1       
[46] AnnotationDbi_1.42.1   bindrcpp_0.2.2         compiler_3.5.0         rlang_0.3.0.1          grid_3.5.0            
[51] RCurl_1.95-4.11        rstudioapi_0.8         htmlwidgets_1.3        bitops_1.0-6           base64enc_0.1-3       
[56] gtable_0.2.0           DBI_1.0.0              R6_2.3.0               gridExtra_2.3          knitr_1.20            
[61] dplyr_0.7.8            bit_1.1-14             bindr_0.1.1            Hmisc_4.1-1            stringi_1.2.4         
[66] Rcpp_1.0.0             geneplotter_1.58.0     rpart_4.1-13           acepack_1.4.1          tidyselect_0.2.5

deseq2 design • 1.1k views

ADD COMMENT • link updated 5.8 years ago by Michael Love 43k • written 5.8 years ago by lu.ne • 0

score 1 · Accepted Answer · 2019-01-29

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

If the goal is "to identify differentially expressed genes between healthy and disease samples for each one of the cell lines", take a look at the first paragraph on the vignette section on Interactions.

ADD COMMENT • link 5.8 years ago Michael Love 43k

0

Entering edit mode

Ah, I had somehow assumed that this approach would result in a sensitivity loss because everything was analysed together and that I would need to drop one of the terms in the design to prevent that. I now see how combining the factors could solve that. Thank you for making the parallel, it seems obvious now...

ADD REPLY • link 5.8 years ago lu.ne • 0