DESeq2 Error in estimateSizeFactorsForMatrix
3
2
Entering edit mode
Guest User ★ 12k
@guest-user-4897
Last seen 7.1 years ago
Hello Michael. I am a graduate student neuroscience researcher attempting to use the DESeq2 package to perform differential expression analysis of my sequencing data. I am following the beginner vignette but substituting my own data into the code. I managed to get to the part where it tells me to call the DESeq function, but I received the following error: > dds <- DESeq(ddsFull) estimating size factors Error in estimateSizeFactorsForMatrix(counts(object), locfunc, geoMeans = geoMeans) : every gene contains at least one zero, cannot compute log geometric means ----- My data was generated from FASTQ files from the sequencer, which I quality/adapter trimmed, and then aligned to our reference genome using the programs STAR and Bowtie2. The unmapped reads from the STAR program were subsequently run through Bowtie2 and the SAM file outputs from both alignment programs were combined using Picard-Tools MergeSAM. The merged SAM files were then converted to BAM files and I began the DESeq2 beginner tutorial. Could you please help me or direct me to a source where I might find a solution to my error problem? A "Google search" on the error did not return useful results. Thank you very much. Best, Caleb Bostwick -- output of sessionInfo(): R version 3.1.0 (2014-04-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] DESeq2_1.4.5 RcppArmadillo_0.4.300.0 Rcpp_0.11.1 GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.0 Biostrings_2.32.0 XVector_0.4.0 [9] GenomicFeatures_1.16.1 AnnotationDbi_1.26.0 Biobase_2.24.0 GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.7 BiocGenerics_0.10.0 BiocInstaller_1.14.2 loaded via a namespace (and not attached): [1] annotate_1.42.0 BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.1 biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6 codetools_0.2-8 DBI_0.2-7 digest_0.6.4 [11] fail_1.2 foreach_1.4.2 genefilter_1.46.1 geneplotter_1.42.0 grid_3.1.0 iterators_1.0.7 lattice_0.20-29 locfit_1.5-9.1 plyr_1.8.1 RColorBrewer_1.0-5 [21] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.24.1 sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 stringr_0.6.2 survival_2.37-7 tools_3.1.0 XML_3.98-1.1 [31] xtable_1.7-3 zlibbioc_1.10.0 -- Sent via the guest posting facility at bioconductor.org.
0
Entering edit mode
@caleb-bostwick-6580
Last seen 2.4 years ago
Hello Michael. I am a neuroscience researcher attempting to use the DESeq2 package to perform differential expression analysis of my sequencing data. I am following the beginner vignette but substituting my own data into the code. I managed to get to the part where it tells me to call the DESeq function, but I received the following error: > dds <- DESeq(ddsFull) estimating size factors Error in estimateSizeFactorsForMatrix(counts(object), locfunc, geoMeans = geoMeans) : every gene contains at least one zero, cannot compute log geometric means ----- My data was generated from FASTQ files from the sequencer, which I quality/adapter trimmed, and then aligned to our reference genome using the programs STAR and Bowtie2. The unmapped reads from the STAR program were subsequently run through Bowtie2 and the SAM file outputs from both alignment programs were combined using Picard-Tools MergeSAM. The merged SAM files were then converted to BAM files and I began the DESeq2 beginner tutorial. Could you please help me or direct me to a source where I might find a solution to my error problem? A "Google search" on the error did not return useful results. Thank you very much. Best, Caleb Bostwick The posting guide said I should include sessionInfo(): >sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] DESeq2_1.4.5 RcppArmadillo_0.4.300.0 Rcpp_0.11.1 GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.0 Biostrings_2.32.0 XVector_0.4.0 [9] GenomicFeatures_1.16.1 AnnotationDbi_1.26.0 Biobase_2.24.0 GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.7 BiocGenerics_0.10.0 BiocInstaller_1.14.2 loaded via a namespace (and not attached): [1] annotate_1.42.0 BatchJobs_1.2 BBmisc_1.6 BiocParallel_0.6.1 biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6 codetools_0.2-8 DBI_0.2-7 digest_0.6.4 [11] fail_1.2 foreach_1.4.2 genefilter_1.46.1 geneplotter_1.42.0 grid_3.1.0 iterators_1.0.7 lattice_0.20-29 locfit_1.5-9.1 plyr_1.8.1 RColorBrewer_1.0-5 [21] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.24.1 sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 stringr_0.6.2 survival_2.37-7 tools_3.1.0 XML_3.98-1.1 [31] xtable_1.7-3 zlibbioc_1.10.0 [[alternative HTML version deleted]]
0
Entering edit mode
Simon Anders ★ 3.7k
@simon-anders-3855
Last seen 14 months ago
Zentrum für Molekularbiologie, Universi…
Dear Caleb On 30/05/14 16:59, Caleb Bostwick [guest] wrote: > Could you please help me or direct me to a source where I might find > a solution to my error problem? A "Google search" on the error did > not return useful results. Thank you very much. For starters, check whether the claim of the error message is actualy true: > Error in estimateSizeFactorsForMatrix(counts(object), locfunc, geoMeans = geoMeans) : > every gene contains at least one zero, cannot compute log geometric means Does every gene contain a zero in at least one of the samples? If so, how comes? Simon
0
Entering edit mode
gaelgarcia • 0
@gaelgarcia-8035
Last seen 9 months ago
UK

I have come across this problem as well. However, I don't understand why this is a problem... I have 96 samples, and each one of the 25,000 genes I estimated counts for is 0 in at least one of those samples. I don't see why this would cause the dispersion estimate to fail?

0
Entering edit mode

Because there is this snippet of code that needs to be run in the course of size factor estimation:

loggeomeans <- rowMeans(log(counts))

And if you have a count matrix where every gene has a 0 in at least one sample:

counts <- matrix(1:100, 10)
diag(counts) <- 0
loggeomeans <- rowMeans(log(counts))

Then every entry for loggeomeans will be infinite:

loggeomeans
[1] -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf -Inf

... and you're hosed.

If you just want to do "something" and move on, you could try supplying a custom set of sizeFactors and not use DESeq2's default. A reasonable choice of alternate size factors might be calculated using edgeR's TMM method, ie:

sizeFactors(dds) <- calcNormFactors(counts(dds))