DESEq2 error when performing exploratory analysis with no replicates
2
0
Entering edit mode
pro_zac • 0
@pro_zac-11908
Last seen 8.1 years ago

Hello,

I have recently been trying to analyse 12 non-replicated metagenome samples using DESeq2 as an exploratory analysis to find genes with differing abundance to allow for inferences to be obtained with respect to GO terms and KEGG pathways associated with these genes. I have formatted a counts table from a Bowtie2 mapping into HTSeq gene region count extraction procedure which I can successfully read into DESeq2. However, when I attempt to run the program, I receive the following error (note that I'm running it in parallel, and my data set is very, very large):

mean-dispersion relationship
final dispersion estimates, MLE betas: 12 workers
fitting model and testing: 12 workers
-- replacing outliers and refitting for 2151512 genes
-- DESeq argument 'minReplicatesForReplace' = 7
=-- original counts are preserved in counts(dds)
estimating dispersions
Error in estimateDispersionsGeneEst(objectSub, quiet = quiet, modelMatrix = modelMatrix) :
  the number of samples and the number of model coefficients are equal,
  i.e., there are no replicates to estimate the dispersion.
  use an alternate design formula
Calls: DESeq -> refitWithoutOutliers -> estimateDispersionsGeneEst
In addition: Warning message:
In checkForExperimentalReplicates(object, modelMatrix) :
  same number of samples and coefficients to fit,
  estimating dispersion by treating samples as replicates.
  read the ?DESeq section on 'Experiments without replicates'
Execution halted

The R output including headers for my input files as well as the sessionInfo are provided below:

R version 3.2.5 (2016-04-14) -- "Very, Very Secure Dishes"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> ############## START
>
> # Load deseq2 and set working directory
> library(DESeq2)
> library(pheatmap)
> library(RColorBrewer)
> library(BiocParallel)
> setwd("/home/workingdir")
>
> register(MulticoreParam(12))
>
> # Load in data
> countData = as.matrix(read.table(file='combined.htseq.counts', header = TRUE, sep = "\t", stringsAsFactors = FALSE, row.names="gene_id"))
> head(countData)
              AR DW GH1  LS   OF STP1 STP2 STP11A STP12B STP13A STP15C  WS
OF0002336      0  8   0   0 1744    0    0      0      0      0      0   0
OF0057878      0  0   0   0 1560    0    6      2      4      0      0   0
OF0008217      0  0   0   0 1004    0    0      0      0      0      0   0
STP11A0014293  0  0   2   0   33    2  378    697   1674    294    474   0
LS0032436      0  0   0 981    0    0    0      0      0      1      0 149
OF0055859      0  0   0   0  892    0    0      0      1      1      0   0
>
> colData = read.csv('colData.csv')
> head(colData)
  condition
1        AR
2        DW
3       GH1
4        LS
5        OF
6      STP1
>
> # Create DESeq2 data set
> dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData, design = ~ condition)
> dds <- dds[ rowSums(counts(dds)) > 1, ]   # 1.3.6 Pre-filtering - removes genes for which only 0 or 1 reads are present
>
> # Reduce RAM usage
> rm(countData)
> rm(colData)
>
> # Session info
> sessionInfo()
R version 3.2.5 (2016-04-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: SUSE Linux Enterprise Server 11 SP4

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] BiocParallel_1.4.3         RColorBrewer_1.1-2        
 [3] pheatmap_1.0.8             DESeq2_1.10.1             
 [5] RcppArmadillo_0.7.500.0.0  Rcpp_0.12.8               
 [7] SummarizedExperiment_1.0.2 Biobase_2.30.0            
 [9] GenomicRanges_1.22.4       GenomeInfoDb_1.6.3        
[11] IRanges_2.4.8              S4Vectors_0.8.11          
[13] BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] genefilter_1.52.1    locfit_1.5-9.1       splines_3.2.5       
 [4] lattice_0.20-33      colorspace_1.3-1     htmltools_0.3.5     
 [7] chron_2.3-47         survival_2.40-1      XML_3.98-1.5        
[10] foreign_0.8-66       DBI_0.5-1            lambda.r_1.1.9      
[13] plyr_1.8.4           stringr_1.1.0        zlibbioc_1.16.0     
[16] munsell_0.4.3        gtable_0.2.0         futile.logger_1.4.3
[19] latticeExtra_0.6-28  knitr_1.15.1         geneplotter_1.48.0  
[22] AnnotationDbi_1.32.3 htmlTable_1.7        acepack_1.4.1       
[25] xtable_1.8-2         scales_0.4.1         Hmisc_4.0-0         
[28] annotate_1.48.0      XVector_0.10.0       gridExtra_2.2.1     
[31] ggplot2_2.2.0        digest_0.6.10        stringi_1.1.2       
[34] grid_3.2.5           tools_3.2.5          magrittr_1.5        
[37] lazyeval_0.2.0       tibble_1.2           RSQLite_1.0.0       
[40] Formula_1.2-1        cluster_2.0.5        futile.options_1.0.0
[43] Matrix_1.2-4         data.table_1.9.6     assertthat_0.1      
[46] rpart_4.1-10         nnet_7.3-12         
>
> # Perform the differential expression (1.4)
> dds <- DESeq(dds, parallel=TRUE)

^^ This is where the code dies

If someone could help me figure out why DESeq2 is not allowing me to run without replicates (as I understand it is capable of despite encouraging you not to take any results as conclusive) that would be appreciated. Thanks.

software error deseq2 metagenomics • 6.6k views
ADD COMMENT
1
Entering edit mode
Simon Anders ★ 3.8k
@simon-anders-3855
Last seen 4.4 years ago
Zentrum für Molekularbiologie, Universi…

DESeq2 is a tool to test statistical hypotheses, i.e., to calculate p values to quantify how much evidence you have for differential evidence. As you don't have replicates, you can't have p values.

For exploratory data analysis, you could consider using the "rlog" transformation and then just look at the genes with highest variance across conditions.

ADD COMMENT
0
Entering edit mode

Hi Simon,

Thanks for your response. From reading elsewhere I was given the impression that running DESeq() normally should still provide me with some level of results rather than crashing, but I may have misinterpreted these comments. I will look into how I can best analyse rlog data in as scientific a manner as possible, then.

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 4 hours ago
United States

I can't replicate the "Execution halted" using simulated data and the current version of DESeq2 (1.14), but I do see a problem in the code base: here we should not replace outliers because there are no replicates (it only seems so for a period during the dispersion estimation when DESeq2 pretends the different conditions are replicates).

You should be able to bypass this by setting minReplicateForReplace=Inf, and I can fix this in future versions to not try to replace outliers when there are no replicates.

ADD COMMENT

Login before adding your answer.

Traffic: 812 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6