DESeq2::DESeq(..., parallel = TRUE) fails if a partition has only genes with 0 counts across all samples
1
0
Entering edit mode
Steffi Grote ▴ 10
@f67b142b
Last seen 7 weeks ago
Germany, Göttingen

Hi,

using DESeq2::DESeq(..., parallel = TRUE) with a specific dataset, where genes (rows) were sorted by abundance, I have observed the following error:

Error in (function (cond) :
error in evaluating the argument 'args' in selecting a method for function 'do.call': BiocParallel errors element index: 3
first error: error in evaluating the argument 'x' in selecting a method for function 't': no right-hand side in 'b'

I can reproduce the error with a modified example dataset as used in DESeq2's vignette:

# 1) Read example dataset from pasilla

library("pasilla")
pasCts <- system.file("extdata",
  "pasilla_gene_counts.tsv",
  package = "pasilla", mustWork = TRUE
)
pasAnno <- system.file("extdata",
  "pasilla_sample_annotation.csv",
  package = "pasilla", mustWork = TRUE
)
cts <- as.matrix(read.csv(pasCts, sep = "\t", row.names = "gene_id"))

# there are already quite some genes which have 0 counts across all samples
table(rowSums(cts) == 0)
# FALSE  TRUE 
# 12359  2240 

coldata <- read.csv(pasAnno, row.names = 1)
coldata <- coldata[, c("condition", "type")]
coldata$condition <- factor(coldata$condition)
coldata$type <- factor(coldata$type)


# 2) Modify the count matrix to trigger the error

# set the last 2/5 of genes to 0 counts for all samples
i <- cut(seq_len(nrow(cts)), 5, labels = FALSE)
cts[i %in% c(4,5), ] <- 0


# 3) Run DESeq with different numbers of cores

library("DESeq2")
dds <- DESeqDataSetFromMatrix(
  countData = cts,
  colData = coldata,
  design = ~condition
)

# using 3 cores or more triggers the error
BiocParallel::register(BiocParallel::MulticoreParam(3))
res <- DESeq(dds, parallel = TRUE)

# using less than 3 cores works just fine
BiocParallel::register(BiocParallel::MulticoreParam(2))
res <- DESeq(dds, parallel = TRUE)
res <- DESeq(dds, parallel = FALSE)


# 4) It looks like blocks where all genes have 0 counts make the difference:

counts_per_gene <- rowSums(cts)

# if genes are divided into 3 blocks, the last block would have 0-count genes only:
i <- cut(seq_along(counts_per_gene), 3, labels = FALSE)
mean_counts_per_partition <- sapply(unique(i), function(j) {
  mean(counts_per_gene[i == j])  
})
mean_counts_per_partition
# [1] 11378.01  3433.68     0.00

# if divided into 2 blocks, none would have 0-count genes only:
i <- cut(seq_along(counts_per_gene), 2, labels = FALSE)
mean_counts_per_partition <- sapply(unique(i), function(j) {
  mean(counts_per_gene[i == j])  
})
mean_counts_per_partition
# [1] 9123.2308  751.5357

While this problem can easily be tackled by using parallel = FALSE or by removing rows with 0 counts beforehand, it might be nicer if such situation would be caught inside DESeq2::DESeq.

Thanks in advance,
Steffi

Here is my session Info:

> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS/LAPACK: /software/R-v4.0.4/lib/libopenblasp-r0.3.13.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.30.1               SummarizedExperiment_1.20.0 Biobase_2.50.0              MatrixGenerics_1.2.1       
 [5] matrixStats_0.58.0          GenomicRanges_1.42.0        GenomeInfoDb_1.26.4         IRanges_2.24.1             
 [9] S4Vectors_0.28.1            BiocGenerics_0.36.0         pasilla_1.18.1             

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6             locfit_1.5-9.4         lattice_0.20-41        assertthat_0.2.1       utf8_1.2.1            
 [6] R6_2.5.0               RSQLite_2.2.4          evoBase_0.1.0          httr_1.4.2             ggplot2_3.3.3         
[11] pillar_1.5.1           zlibbioc_1.36.0        rlang_0.4.10           annotate_1.68.0        blob_1.2.1            
[16] shinyEvotec_1.0        Matrix_1.3-2           splines_4.0.4          BiocParallel_1.24.1    geneplotter_1.68.0    
[21] RCurl_1.98-1.3         bit_4.0.4              munsell_0.5.0          DelayedArray_0.16.2    compiler_4.0.4        
[26] pkgconfig_2.0.3        tidyselect_1.1.0       tibble_3.1.0           GenomeInfoDbData_1.2.4 XML_3.99-0.6          
[31] fansi_0.4.2            crayon_1.4.1           dplyr_1.0.5            bitops_1.0-6           grid_4.0.4            
[36] xtable_1.8-4           gtable_0.3.0           lifecycle_1.0.0        DBI_1.1.1              magrittr_2.0.1        
[41] scales_1.1.1           debugme_1.1.0          cachem_1.0.4           XVector_0.30.0         genefilter_1.72.1     
[46] ellipsis_0.3.1         vctrs_0.3.6            generics_0.1.0         fst_0.9.4              RColorBrewer_1.1-2    
[51] tools_4.0.4            bit64_4.0.5            glue_1.4.2             purrr_0.3.4            fastmap_1.1.0         
[56] survival_3.2-10        yaml_2.2.1             AnnotationDbi_1.52.0   colorspace_2.0-0       memoise_2.0.0
DESeq2 BiocParallel • 1.7k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

Thanks for the note. I'll add this to the documentation now. It will be much easier to handle this just by filtering out all 0's genes from the start. Otherwise it's a waste of memory: sending a large matrix of 0's to a child process though this will result only in NAs.

ADD COMMENT
0
Entering edit mode

Thanks for the quick answer!

ADD REPLY
0
Entering edit mode

I took some time to also update the section in the vignette on parallelization:

https://github.com/mikelove/DESeq2/commit/e414a2fba3db15b77db2f590ad189ae24ff9af42

ADD REPLY
0
Entering edit mode

Great! Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6