Question: DESEq2 error when performing exploratory analysis with no replicates
0
2.9 years ago by
pro_zac0
pro_zac0 wrote:

Hello,

I have recently been trying to analyse 12 non-replicated metagenome samples using DESeq2 as an exploratory analysis to find genes with differing abundance to allow for inferences to be obtained with respect to GO terms and KEGG pathways associated with these genes. I have formatted a counts table from a Bowtie2 mapping into HTSeq gene region count extraction procedure which I can successfully read into DESeq2. However, when I attempt to run the program, I receive the following error (note that I'm running it in parallel, and my data set is very, very large):

mean-dispersion relationship
final dispersion estimates, MLE betas: 12 workers
fitting model and testing: 12 workers
-- replacing outliers and refitting for 2151512 genes
-- DESeq argument 'minReplicatesForReplace' = 7
=-- original counts are preserved in counts(dds)
estimating dispersions
Error in estimateDispersionsGeneEst(objectSub, quiet = quiet, modelMatrix = modelMatrix) :
the number of samples and the number of model coefficients are equal,
i.e., there are no replicates to estimate the dispersion.
use an alternate design formula
Calls: DESeq -> refitWithoutOutliers -> estimateDispersionsGeneEst
In checkForExperimentalReplicates(object, modelMatrix) :
same number of samples and coefficients to fit,
estimating dispersion by treating samples as replicates.
read the ?DESeq section on 'Experiments without replicates'
Execution halted

The R output including headers for my input files as well as the sessionInfo are provided below:

R version 3.2.5 (2016-04-14) -- "Very, Very Secure Dishes"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> ############## START
>
> # Load deseq2 and set working directory
> library(DESeq2)
> library(pheatmap)
> library(RColorBrewer)
> library(BiocParallel)
> setwd("/home/workingdir")
>
> register(MulticoreParam(12))
>
> countData = as.matrix(read.table(file='combined.htseq.counts', header = TRUE, sep = "\t", stringsAsFactors = FALSE, row.names="gene_id"))
AR DW GH1  LS   OF STP1 STP2 STP11A STP12B STP13A STP15C  WS
OF0002336      0  8   0   0 1744    0    0      0      0      0      0   0
OF0057878      0  0   0   0 1560    0    6      2      4      0      0   0
OF0008217      0  0   0   0 1004    0    0      0      0      0      0   0
STP11A0014293  0  0   2   0   33    2  378    697   1674    294    474   0
LS0032436      0  0   0 981    0    0    0      0      0      1      0 149
OF0055859      0  0   0   0  892    0    0      0      1      1      0   0
>
condition
1        AR
2        DW
3       GH1
4        LS
5        OF
6      STP1
>
> # Create DESeq2 data set
> dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ condition)
> dds <- dds[ rowSums(counts(dds)) > 1, ]   # 1.3.6 Pre-filtering - removes genes for which only 0 or 1 reads are present
>
> # Reduce RAM usage
> rm(countData)
> rm(colData)
>
> # Session info
> sessionInfo()
R version 3.2.5 (2016-04-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: SUSE Linux Enterprise Server 11 SP4

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
[1] BiocParallel_1.4.3         RColorBrewer_1.1-2
[3] pheatmap_1.0.8             DESeq2_1.10.1
[7] SummarizedExperiment_1.0.2 Biobase_2.30.0
[9] GenomicRanges_1.22.4       GenomeInfoDb_1.6.3
[11] IRanges_2.4.8              S4Vectors_0.8.11
[13] BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
[1] genefilter_1.52.1    locfit_1.5-9.1       splines_3.2.5
[4] lattice_0.20-33      colorspace_1.3-1     htmltools_0.3.5
[7] chron_2.3-47         survival_2.40-1      XML_3.98-1.5
[10] foreign_0.8-66       DBI_0.5-1            lambda.r_1.1.9
[13] plyr_1.8.4           stringr_1.1.0        zlibbioc_1.16.0
[16] munsell_0.4.3        gtable_0.2.0         futile.logger_1.4.3
[19] latticeExtra_0.6-28  knitr_1.15.1         geneplotter_1.48.0
[22] AnnotationDbi_1.32.3 htmlTable_1.7        acepack_1.4.1
[25] xtable_1.8-2         scales_0.4.1         Hmisc_4.0-0
[28] annotate_1.48.0      XVector_0.10.0       gridExtra_2.2.1
[31] ggplot2_2.2.0        digest_0.6.10        stringi_1.1.2
[34] grid_3.2.5           tools_3.2.5          magrittr_1.5
[37] lazyeval_0.2.0       tibble_1.2           RSQLite_1.0.0
[40] Formula_1.2-1        cluster_2.0.5        futile.options_1.0.0
[43] Matrix_1.2-4         data.table_1.9.6     assertthat_0.1
[46] rpart_4.1-10         nnet_7.3-12
>
> # Perform the differential expression (1.4)
> dds <- DESeq(dds, parallel=TRUE)

^^ This is where the code dies

If someone could help me figure out why DESeq2 is not allowing me to run without replicates (as I understand it is capable of despite encouraging you not to take any results as conclusive) that would be appreciated. Thanks.

modified 2.9 years ago by Michael Love25k • written 2.9 years ago by pro_zac0
Answer: DESEq2 error when performing exploratory analysis with no replicates
1
2.9 years ago by
Simon Anders3.6k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.6k wrote:

DESeq2 is a tool to test statistical hypotheses, i.e., to calculate p values to quantify how much evidence you have for differential evidence. As you don't have replicates, you can't have p values.

For exploratory data analysis, you could consider using the "rlog" transformation and then just look at the genes with highest variance across conditions.

Hi Simon,

Thanks for your response. From reading elsewhere I was given the impression that running DESeq() normally should still provide me with some level of results rather than crashing, but I may have misinterpreted these comments. I will look into how I can best analyse rlog data in as scientific a manner as possible, then.

Answer: DESEq2 error when performing exploratory analysis with no replicates
1
2.9 years ago by
Michael Love25k
United States
Michael Love25k wrote:

I can't replicate the "Execution halted" using simulated data and the current version of DESeq2 (1.14), but I do see a problem in the code base: here we should not replace outliers because there are no replicates (it only seems so for a period during the dispersion estimation when DESeq2 pretends the different conditions are replicates).

You should be able to bypass this by setting minReplicateForReplace=Inf, and I can fix this in future versions to not try to replace outliers when there are no replicates.