Question: DESEq2 error when performing exploratory analysis with no replicates
gravatar for pro_zac
12 months ago by
pro_zac0 wrote:


I have recently been trying to analyse 12 non-replicated metagenome samples using DESeq2 as an exploratory analysis to find genes with differing abundance to allow for inferences to be obtained with respect to GO terms and KEGG pathways associated with these genes. I have formatted a counts table from a Bowtie2 mapping into HTSeq gene region count extraction procedure which I can successfully read into DESeq2. However, when I attempt to run the program, I receive the following error (note that I'm running it in parallel, and my data set is very, very large):

mean-dispersion relationship
final dispersion estimates, MLE betas: 12 workers
fitting model and testing: 12 workers
-- replacing outliers and refitting for 2151512 genes
-- DESeq argument 'minReplicatesForReplace' = 7
=-- original counts are preserved in counts(dds)
estimating dispersions
Error in estimateDispersionsGeneEst(objectSub, quiet = quiet, modelMatrix = modelMatrix) :
  the number of samples and the number of model coefficients are equal,
  i.e., there are no replicates to estimate the dispersion.
  use an alternate design formula
Calls: DESeq -> refitWithoutOutliers -> estimateDispersionsGeneEst
In addition: Warning message:
In checkForExperimentalReplicates(object, modelMatrix) :
  same number of samples and coefficients to fit,
  estimating dispersion by treating samples as replicates.
  read the ?DESeq section on 'Experiments without replicates'
Execution halted

The R output including headers for my input files as well as the sessionInfo are provided below:

R version 3.2.5 (2016-04-14) -- "Very, Very Secure Dishes"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> ############## START
> # Load deseq2 and set working directory
> library(DESeq2)
> library(pheatmap)
> library(RColorBrewer)
> library(BiocParallel)
> setwd("/home/workingdir")
> register(MulticoreParam(12))
> # Load in data
> countData = as.matrix(read.table(file='combined.htseq.counts', header = TRUE, sep = "\t", stringsAsFactors = FALSE, row.names="gene_id"))
> head(countData)
              AR DW GH1  LS   OF STP1 STP2 STP11A STP12B STP13A STP15C  WS
OF0002336      0  8   0   0 1744    0    0      0      0      0      0   0
OF0057878      0  0   0   0 1560    0    6      2      4      0      0   0
OF0008217      0  0   0   0 1004    0    0      0      0      0      0   0
STP11A0014293  0  0   2   0   33    2  378    697   1674    294    474   0
LS0032436      0  0   0 981    0    0    0      0      0      1      0 149
OF0055859      0  0   0   0  892    0    0      0      1      1      0   0
> colData = read.csv('colData.csv')
> head(colData)
1        AR
2        DW
3       GH1
4        LS
5        OF
6      STP1
> # Create DESeq2 data set
> dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ condition)
> dds <- dds[ rowSums(counts(dds)) > 1, ]   # 1.3.6 Pre-filtering - removes genes for which only 0 or 1 reads are present
> # Reduce RAM usage
> rm(countData)
> rm(colData)
> # Session info
> sessionInfo()
R version 3.2.5 (2016-04-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: SUSE Linux Enterprise Server 11 SP4

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] BiocParallel_1.4.3         RColorBrewer_1.1-2        
 [3] pheatmap_1.0.8             DESeq2_1.10.1             
 [5] RcppArmadillo_0.7.500.0.0  Rcpp_0.12.8               
 [7] SummarizedExperiment_1.0.2 Biobase_2.30.0            
 [9] GenomicRanges_1.22.4       GenomeInfoDb_1.6.3        
[11] IRanges_2.4.8              S4Vectors_0.8.11          
[13] BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] genefilter_1.52.1    locfit_1.5-9.1       splines_3.2.5       
 [4] lattice_0.20-33      colorspace_1.3-1     htmltools_0.3.5     
 [7] chron_2.3-47         survival_2.40-1      XML_3.98-1.5        
[10] foreign_0.8-66       DBI_0.5-1            lambda.r_1.1.9      
[13] plyr_1.8.4           stringr_1.1.0        zlibbioc_1.16.0     
[16] munsell_0.4.3        gtable_0.2.0         futile.logger_1.4.3
[19] latticeExtra_0.6-28  knitr_1.15.1         geneplotter_1.48.0  
[22] AnnotationDbi_1.32.3 htmlTable_1.7        acepack_1.4.1       
[25] xtable_1.8-2         scales_0.4.1         Hmisc_4.0-0         
[28] annotate_1.48.0      XVector_0.10.0       gridExtra_2.2.1     
[31] ggplot2_2.2.0        digest_0.6.10        stringi_1.1.2       
[34] grid_3.2.5           tools_3.2.5          magrittr_1.5        
[37] lazyeval_0.2.0       tibble_1.2           RSQLite_1.0.0       
[40] Formula_1.2-1        cluster_2.0.5        futile.options_1.0.0
[43] Matrix_1.2-4         data.table_1.9.6     assertthat_0.1      
[46] rpart_4.1-10         nnet_7.3-12         
> # Perform the differential expression (1.4)
> dds <- DESeq(dds, parallel=TRUE)

^^ This is where the code dies

If someone could help me figure out why DESeq2 is not allowing me to run without replicates (as I understand it is capable of despite encouraging you not to take any results as conclusive) that would be appreciated. Thanks.

ADD COMMENTlink modified 11 months ago by Michael Love15k • written 12 months ago by pro_zac0
gravatar for Simon Anders
12 months ago by
Simon Anders3.4k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.4k wrote:

DESeq2 is a tool to test statistical hypotheses, i.e., to calculate p values to quantify how much evidence you have for differential evidence. As you don't have replicates, you can't have p values.

For exploratory data analysis, you could consider using the "rlog" transformation and then just look at the genes with highest variance across conditions.

ADD COMMENTlink written 12 months ago by Simon Anders3.4k

Hi Simon,

Thanks for your response. From reading elsewhere I was given the impression that running DESeq() normally should still provide me with some level of results rather than crashing, but I may have misinterpreted these comments. I will look into how I can best analyse rlog data in as scientific a manner as possible, then.

ADD REPLYlink written 12 months ago by pro_zac0
gravatar for Michael Love
11 months ago by
Michael Love15k
United States
Michael Love15k wrote:

I can't replicate the "Execution halted" using simulated data and the current version of DESeq2 (1.14), but I do see a problem in the code base: here we should not replace outliers because there are no replicates (it only seems so for a period during the dispersion estimation when DESeq2 pretends the different conditions are replicates).

You should be able to bypass this by setting minReplicateForReplace=Inf, and I can fix this in future versions to not try to replace outliers when there are no replicates.

ADD COMMENTlink written 11 months ago by Michael Love15k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 332 users visited in the last hour