Dear Mike Love,
I just upgraded DESeq2 to get the last version (1.12.0) and I have a problem when using the estimateSizeFactors() function on a DESeqDataSet object built from RSEM outputs (with the help of tximport). The problem is that infinite values are generated because of the use of the log() on transcript/gene lengths equal to 0 (in the txi.rsem$length matrix) when computing the geometric mean. Here is a piece of R code from the tximport vignette slightly modified to reproduce the error:
library(tximportData) library(tximport) dir <- system.file("extdata", package = "tximportData") samples <- read.table(file.path(dir, "samples.txt"), header = TRUE) files <- file.path(dir, "rsem", samples$run, paste0(samples$run, ".genes.results")) names(files) <- paste0("sample", 1:6) txi.rsem <- tximport(files, type = "rsem") ### modification: ### txi.rsem$length[which.min(txi.rsem$length)] <- 0 ddsrsem <- DESeqDataSetFromTximport(txi.rsem, colData=samples, design=~1) ddsrsem <- estimateSizeFactors(ddsrsem) # Erreur : all(!is.na(value)) is not TRUE
Do you think I should filter out rows for which there is at least one transcript/gene length equal to 0?
Below is the result of my sessionInfo():R version 3.3.0 (2016-05-03) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04 LTS
locale: [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 LC_PAPER=fr_FR.UTF-8 [8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages: [1] readr_0.2.2 tximport_1.0.0 tximportData_1.0.0 DESeq2_1.12.0 SummarizedExperiment_1.2.0 Biobase_2.32.0 GenomicRanges_1.24.0 [8] GenomeInfoDb_1.8.0 IRanges_2.6.0 S4Vectors_0.10.0 BiocGenerics_0.18.0
loaded via a namespace (and not attached): [1] Rcpp_0.12.4.5 RColorBrewer_1.1-2 plyr_1.8.3 XVector_0.12.0 tools_3.3.0 zlibbioc_1.18.0 rpart_4.1-10 RSQLite_1.0.0 annotate_1.50.0 [10] gtable_0.2.0 lattice_0.20-33 Matrix_1.2-6 DBI_0.4-1 gridExtra_2.2.1 genefilter_1.54.0 cluster_2.0.4 locfit_1.5-9.1 grid_3.3.0 [19] nnet_7.3-12 data.table_1.9.6 AnnotationDbi_1.34.0 XML_3.98-1.4 survival_2.39-3 BiocParallel_1.6.0 foreign_0.8-66 latticeExtra_0.6-28 Formula_1.2-1 [28] geneplotter_1.50.0 ggplot2_2.1.0 Hmisc_3.17-4 scales_0.4.0 splines_3.3.0 colorspace_1.2-6 xtable_1.8-2 acepack_1.3-3.3 munsell_0.4.3 [37] chron_2.3-47
Please let me know if you need more information.
Many thanks for your help, best regards,
Hugo