I am running a differential gene expression analysis on the TCGA-BRCA cohort to investigate the effect of tumour cellularity on differential gene expression analysis (how the biology inferred from these analyses may differ if cellularity is included as a covariate). My experimental setup include running the
DESeq function with (1) batch effects (BatchId + TSS) and cellularity (ESTIMATE) as a covariate and (2) batch effects (BatchId + TSS) only. The DGE analysis is done between two subtypes of breast cancer (encoded as the HR_status variable)
# Generate DESeq object from raw counts and include batch effects and cellularity as coveriate DEseq.pur <- DESeqDataSetFromMatrix(countData = rounded.est, colData = pca.plots.all.purity.est, design = ~ HR_status + BatchId + TSS + ESTIMATE) # Object generate without cellularity as a covariate DEseq.nopur <- DESeqDataSetFromMatrix(countData = rounded.est, colData = pca.plots.all.purity.est, design = ~ HR_status + BatchId + TSS)
After filtering the initial dataset to remove low count genes, I have a DESeqDataSet consisting of 33980 genes and 823 samples. The problem is that for experiment 2 (batch effects only), the
DESeq function is not running to completion, despite allowing for more than 12 hours and 10 cpus. The function is stuck on estimating gene-wise dispersion.
library(DESeq2) library(BiocParallel) load("DEseq.est.Rdata") DEseq.est.run <- DESeq(DEseq.est, parallel=TRUE, BPPARAM=MulticoreParam(10))
Weird thing is that if run experiment 1 (batch effects and cellularity as a covariate) with the same specs, this time
DESeq runs in 5 hours. I can't figure out what is happening to cause this....
Output of sessionInfo:
R version 3.6.1 (2019-07-05) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS: /home/uqmdeluc/R-3.6.1/lib/libRblas.so LAPACK: /home/uqmdeluc/R-3.6.1/lib/libRlapack.so locale:  LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C  LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8  LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8  LC_PAPER=en_AU.UTF-8 LC_NAME=C  LC_ADDRESS=C LC_TELEPHONE=C  LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages:  parallel stats4 stats graphics grDevices utils datasets  methods base other attached packages:  DESeq2_1.26.0 SummarizedExperiment_1.16.1  DelayedArray_0.12.3 BiocParallel_1.20.1  matrixStats_0.58.0 Biobase_2.46.0  GenomicRanges_1.38.0 GenomeInfoDb_1.22.1  IRanges_2.20.2 S4Vectors_0.24.4  BiocGenerics_0.32.0 loaded via a namespace (and not attached):  bit64_4.0.5 splines_3.6.1 Formula_1.2-4  assertthat_0.2.1 latticeExtra_0.6-29 blob_1.2.1  GenomeInfoDbData_1.2.2 pillar_1.5.0 RSQLite_2.2.3  backports_1.2.1 lattice_0.20-38 glue_1.4.2  digest_0.6.27 RColorBrewer_1.1-2 XVector_0.26.0  checkmate_2.0.0 colorspace_2.0-0 htmltools_0.5.1.1  Matrix_1.2-17 XML_3.99-0.3 pkgconfig_2.0.3  genefilter_1.68.0 zlibbioc_1.32.0 purrr_0.3.4  xtable_1.8-4 scales_1.1.1 jpeg_0.1-8.1  htmlTable_2.1.0 tibble_3.1.0 annotate_1.64.0  generics_0.1.0 ggplot2_3.3.3 ellipsis_0.3.1  cachem_1.0.4 nnet_7.3-12 survival_3.2-7  magrittr_2.0.1 crayon_1.4.1 memoise_2.0.0  fansi_0.4.2 foreign_0.8-71 tools_3.6.1  data.table_1.14.0 lifecycle_1.0.0 stringr_1.4.0  locfit_1.5-9.4 munsell_0.5.0 cluster_2.1.0  AnnotationDbi_1.48.0 compiler_3.6.1 rlang_0.4.10  grid_3.6.1 RCurl_1.98-1.2 rstudioapi_0.13  htmlwidgets_1.5.3 bitops_1.0-6 base64enc_0.1-3  gtable_0.3.0 DBI_1.1.1 R6_2.5.0  gridExtra_2.3 knitr_1.31 dplyr_1.0.4  fastmap_1.1.0 bit_4.0.4 utf8_1.1.4  Hmisc_4.5-0 stringi_1.5.3 Rcpp_1.0.6  geneplotter_1.64.0 vctrs_0.3.6 rpart_4.1-15  png_0.1-7 tidyselect_1.1.0 xfun_0.21