Dear Michael
This is a kind of a follow up of a previous post (https://support.bioconductor.org/p/59899/).
I am trying to use DESeq2 for counts data transformation. My data is coming from mapping to a de-novo assembled transcriptome so we therefore use eXpress for abundance estimation. When I try to run varianceStabilizingTransformation() or DESeq()received the error:
“Error in estimateSizeFactorsForMatrix(counts(object), locfunc, geoMeans = geoMeans) : every gene contains at least one zero, cannot compute log geometric means”
I can understand that a log transformation would be problematic if there are zeros but it makes sense that some genes, in some conditions, have zero counts estimated (i.e. are not expressed). Am I missing something or is there a workaround like the y=log2(y+ 1)?
Sincerely,
Moran
R code
data <- read.table("matrix", header = T, row.names = 1, com= '')
data <- round(data)
data <- data[rowSums(data) >= 13,]
conditions = data.frame(conditions=factor(c(rep("inter ", 5),
rep("late ", 6),
rep("early ", 10),
rep("flo", 4),
rep("se", 4), )))
rownames(conditions) <- colnames(data)
ddsTable <- DESeqDataSetFromMatrix(countData = data,
colData = conditions,
design = ~ conditions)
vsd <- varianceStabilizingTransformation(ddsTable)
> as.data.frame(colData(ddsTable))
conditions
X10DP inter
X10DP inter
X10DP inter
X10DP inter
X10DP inter
X21DP late
X21DP late
X21DP late
X21DP late
X21DP late
X21DP late
X4DP early
X4DP early
X4DP early
X4DP early
X4DP early
X4DP early
X4DP early
flo_1 flo
flo_2 flo
flo_3 flo
flo_4 flo
se_1 se
se_2 se
se_3 se
se_4 se
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocParallel_0.6.1 BiocInstaller_1.14.3 DESeq2_1.4.5
[4] RcppArmadillo_0.4.450.1.0 Rcpp_0.11.3 GenomicRanges_1.16.4
[7] GenomeInfoDb_1.0.2 IRanges_1.22.10 BiocGenerics_0.10.0
[10] ggplot2_1.0.0 edgeR_3.6.8 limma_3.20.9
loaded via a namespace (and not attached):
[1] annotate_1.42.1 AnnotationDbi_1.26.1 base64enc_0.1-2 BatchJobs_1.4
[5] BBmisc_1.7 Biobase_2.24.0 brew_1.0-6 checkmate_1.5.0
[9] codetools_0.2-9 colorspace_1.2-4 DBI_0.3.1 digest_0.6.4
[13] evaluate_0.5.5 fail_1.2 foreach_1.4.2 formatR_1.0
[17] genefilter_1.46.1 geneplotter_1.42.0 grid_3.1.1 gtable_0.1.2
[21] iterators_1.0.7 knitr_1.7 lattice_0.20-29 locfit_1.5-9.1
[25] MASS_7.3-35 munsell_0.4.2 plyr_1.8.1 proto_0.3-10
[29] RColorBrewer_1.0-5 reshape2_1.4 RSQLite_0.11.4 scales_0.2.4
[33] sendmailR_1.2-1 splines_3.1.1 stats4_3.1.1 stringr_0.6.2
[37] survival_2.37-7 tools_3.1.1 XML_3.98-1.1 xtable_1.7-4
[41] XVector_0.4.0
hi Moran, I forgot to exponentiate the mean of log counts to get the geometric mean. I've now fixed the code in the answer below.
Hi,
I'm very new to DESeq, so I'm not sure if my comments will be of any use, but I got this problem
Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc, :
every gene contains at least one zero, cannot compute log geometric means
when I had samples in rows and reads (amplicons, as I work on microbial communities) in columns. After transposing the data matrix the Error was gone.
Cheers,
Kasia
Try using:
dds <- estimateSizeFactors(dds, type="poscounts")
And if you have an error, post a new question, and supply all of your R code and sessionInfo()