Hello Michael. I am a graduate student neuroscience researcher
attempting to use the DESeq2 package to perform differential
expression analysis of my sequencing data. I am following the beginner
vignette but substituting my own data into the code. I managed to get
to the part where it tells me to call the DESeq function, but I
received the following error:
> dds <- DESeq(ddsFull)
estimating size factors
Error in estimateSizeFactorsForMatrix(counts(object), locfunc,
geoMeans = geoMeans) :
every gene contains at least one zero, cannot compute log geometric
means
-----
My data was generated from FASTQ files from the sequencer, which I
quality/adapter trimmed, and then aligned to our reference genome
using the programs STAR and Bowtie2. The unmapped reads from the STAR
program were subsequently run through Bowtie2 and the SAM file outputs
from both alignment programs were combined using Picard-Tools
MergeSAM. The merged SAM files were then converted to BAM files and I
began the DESeq2 beginner tutorial.
Could you please help me or direct me to a source where I might find a
solution to my error problem? A "Google search" on the error did not
return useful results. Thank you very much.
Best,
Caleb Bostwick
-- output of sessionInfo():
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] DESeq2_1.4.5 RcppArmadillo_0.4.300.0 Rcpp_0.11.1
GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.0
Biostrings_2.32.0 XVector_0.4.0
[9] GenomicFeatures_1.16.1 AnnotationDbi_1.26.0 Biobase_2.24.0
GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.7
BiocGenerics_0.10.0 BiocInstaller_1.14.2
loaded via a namespace (and not attached):
[1] annotate_1.42.0 BatchJobs_1.2 BBmisc_1.6
BiocParallel_0.6.1 biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6
codetools_0.2-8 DBI_0.2-7 digest_0.6.4
[11] fail_1.2 foreach_1.4.2 genefilter_1.46.1
geneplotter_1.42.0 grid_3.1.0 iterators_1.0.7
lattice_0.20-29 locfit_1.5-9.1 plyr_1.8.1
RColorBrewer_1.0-5
[21] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.24.1
sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 stringr_0.6.2
survival_2.37-7 tools_3.1.0 XML_3.98-1.1
[31] xtable_1.7-3 zlibbioc_1.10.0
--
Sent via the guest posting facility at bioconductor.org.
Hello Michael. I am a neuroscience researcher attempting to use the
DESeq2
package to perform differential expression analysis of my sequencing
data.
I am following the beginner vignette but substituting my own data into
the
code. I managed to get to the part where it tells me to call the DESeq
function, but I received the following error:
> dds <- DESeq(ddsFull)
estimating size factors
Error in estimateSizeFactorsForMatrix(counts(object), locfunc,
geoMeans =
geoMeans) :
every gene contains at least one zero, cannot compute log geometric
means
-----
My data was generated from FASTQ files from the sequencer, which I
quality/adapter trimmed, and then aligned to our reference genome
using the
programs STAR and Bowtie2. The unmapped reads from the STAR program
were
subsequently run through Bowtie2 and the SAM file outputs from both
alignment programs were combined using Picard-Tools MergeSAM. The
merged
SAM files were then converted to BAM files and I began the DESeq2
beginner
tutorial.
Could you please help me or direct me to a source where I might find a
solution to my error problem? A "Google search" on the error did not
return
useful results. Thank you very much.
Best,
Caleb Bostwick
The posting guide said I should include sessionInfo():
>sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods
base
other attached packages:
[1] DESeq2_1.4.5 RcppArmadillo_0.4.300.0 Rcpp_0.11.1
GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.0
Biostrings_2.32.0 XVector_0.4.0
[9] GenomicFeatures_1.16.1 AnnotationDbi_1.26.0 Biobase_2.24.0
GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.7
BiocGenerics_0.10.0 BiocInstaller_1.14.2
loaded via a namespace (and not attached):
[1] annotate_1.42.0 BatchJobs_1.2 BBmisc_1.6
BiocParallel_0.6.1 biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6
codetools_0.2-8 DBI_0.2-7 digest_0.6.4
[11] fail_1.2 foreach_1.4.2 genefilter_1.46.1
geneplotter_1.42.0 grid_3.1.0 iterators_1.0.7
lattice_0.20-29
locfit_1.5-9.1 plyr_1.8.1 RColorBrewer_1.0-5
[21] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.24.1
sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 stringr_0.6.2
survival_2.37-7 tools_3.1.0 XML_3.98-1.1
[31] xtable_1.7-3 zlibbioc_1.10.0
[[alternative HTML version deleted]]
Dear Caleb
On 30/05/14 16:59, Caleb Bostwick [guest] wrote:
> Could you please help me or direct me to a source where I might
find
> a solution to my error problem? A "Google search" on the error did
> not return useful results. Thank you very much.
For starters, check whether the claim of the error message is actualy
true:
> Error in estimateSizeFactorsForMatrix(counts(object), locfunc,
geoMeans = geoMeans) :
> every gene contains at least one zero, cannot compute log
geometric means
Does every gene contain a zero in at least one of the samples? If so,
how comes?
Simon
I have come across this problem as well. However, I don't understand why this is a problem... I have 96 samples, and each one of the 25,000 genes I estimated counts for is 0 in at least one of those samples. I don't see why this would cause the dispersion estimate to fail?
If you just want to do "something" and move on, you could try supplying a custom set of sizeFactors and not use DESeq2's default. A reasonable choice of alternate size factors might be calculated using edgeR's TMM method, ie:
Because there is this snippet of code that needs to be run in the course of size factor estimation:
And if you have a count matrix where every gene has a 0 in at least one sample:
Then every entry for
loggeomeans
will be infinite:... and you're hosed.
If you just want to do "something" and move on, you could try supplying a custom set of sizeFactors and not use DESeq2's default. A reasonable choice of alternate size factors might be calculated using edgeR's TMM method, ie: