Search
Question: DESeq2 Error in varianceStabilizingTransformation
1
gravatar for mfarhi
3.6 years ago by
mfarhi10
United States
mfarhi10 wrote:

Dear Michael

This is a kind of a follow up of a previous post (https://support.bioconductor.org/p/59899/).

I am trying to use DESeq2 for counts data transformation. My data is coming from mapping to a de-novo assembled transcriptome so we therefore use eXpress for abundance estimation. When I try to run varianceStabilizingTransformation() or DESeq()received the error:

“Error in estimateSizeFactorsForMatrix(counts(object), locfunc, geoMeans = geoMeans) : every gene contains at least one zero, cannot compute log geometric means”

I can understand that a log transformation would be problematic if there are zeros but it makes sense that some genes, in some conditions, have zero counts estimated (i.e. are not expressed). Am I missing something or is there a workaround like the y=log2(y+ 1)?

 

Sincerely,

 

Moran   

 

R code

data <- read.table("matrix", header = T, row.names = 1, com= '')

data <- round(data)

data <- data[rowSums(data) >= 13,]

conditions = data.frame(conditions=factor(c(rep("inter ", 5),

                                                                             rep("late ", 6),

                                                                             rep("early ", 10),

                                                                             rep("flo", 4),

                                                                             rep("se", 4), )))

rownames(conditions) <- colnames(data)

ddsTable <- DESeqDataSetFromMatrix(countData = data,

                                   colData = conditions,

                                   design = ~ conditions)

vsd <- varianceStabilizingTransformation(ddsTable)

 

> as.data.frame(colData(ddsTable))

                                                conditions

X10DP            inter

X10DP           inter

X10DP            inter

X10DP            inter

X10DP            inter

X21DP             late

X21DP             late

X21DP             late

X21DP             late

X21DP             late

X21DP             late

X4DP             early

X4DP             early

X4DP             early

X4DP             early

X4DP             early

X4DP             early

X4DP             early

flo_1           flo

flo_2           flo

flo_3           flo

flo_4           flo

se_1               se

se_2               se

se_3               se

se_4               se

 

> sessionInfo()

R version 3.1.1 (2014-07-10)

Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:

[1] parallel  stats     graphics  grDevices utils     datasets  methods   base    

other attached packages:

 [1] BiocParallel_0.6.1        BiocInstaller_1.14.3      DESeq2_1.4.5            

 [4] RcppArmadillo_0.4.450.1.0 Rcpp_0.11.3               GenomicRanges_1.16.4     

 [7] GenomeInfoDb_1.0.2        IRanges_1.22.10           BiocGenerics_0.10.0     

[10] ggplot2_1.0.0             edgeR_3.6.8               limma_3.20.9            

loaded via a namespace (and not attached):

 [1] annotate_1.42.1      AnnotationDbi_1.26.1 base64enc_0.1-2      BatchJobs_1.4      

 [5] BBmisc_1.7           Biobase_2.24.0       brew_1.0-6           checkmate_1.5.0    

 [9] codetools_0.2-9      colorspace_1.2-4     DBI_0.3.1            digest_0.6.4       

[13] evaluate_0.5.5       fail_1.2             foreach_1.4.2        formatR_1.0        

[17] genefilter_1.46.1    geneplotter_1.42.0   grid_3.1.1           gtable_0.1.2       

[21] iterators_1.0.7      knitr_1.7            lattice_0.20-29      locfit_1.5-9.1     

[25] MASS_7.3-35          munsell_0.4.2        plyr_1.8.1           proto_0.3-10       

[29] RColorBrewer_1.0-5   reshape2_1.4         RSQLite_0.11.4       scales_0.2.4       

[33] sendmailR_1.2-1      splines_3.1.1        stats4_3.1.1         stringr_0.6.2      

[37] survival_2.37-7      tools_3.1.1          XML_3.98-1.1         xtable_1.7-4       

[41] XVector_0.4.0

ADD COMMENTlink modified 4 weeks ago by kasia_piwosz0 • written 3.6 years ago by mfarhi10

hi Moran, I forgot to exponentiate the mean of log counts to get the geometric mean. I've now fixed the code in the answer below.

ADD REPLYlink written 3.6 years ago by Michael Love17k

Hi,

I'm very new to DESeq, so I'm not sure if my comments will be of any use, but I got this problem


Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc,  :

  every gene contains at least one zero, cannot compute log geometric means


when I had samples in rows and reads (amplicons, as I work on microbial communities) in columns. After transposing the data matrix the Error was gone.

Cheers,

Kasia

ADD REPLYlink written 4 weeks ago by kasia_piwosz0

Try using:

dds <- estimateSizeFactors(dds, type="poscounts")

And if you have an error, post a new question, and supply all of your R code and sessionInfo()

ADD REPLYlink written 4 weeks ago by Michael Love17k
2
gravatar for Michael Love
3.6 years ago by
Michael Love17k
United States
Michael Love17k wrote:

hi Moran,

The median ratio method defined in the first DESeq paper requires calculation of the geometric mean of a row, which is 0 if a single count has a 0, and then taking the ratio of a sample to the geometric mean.

I would first filter the dataset to remove rows with very few counts (with x set to 5 say):

dds <- dds[ rowSums(counts(dds)) > x, ]

Then one alternative estimator I have used when there are no rows without a zero is:

cts <- counts(dds)
geoMeans <- apply(cts, 1, function(row) if (all(row == 0)) 0 else exp(mean(log(row[row != 0]))))
dds <- estimateSizeFactors(dds, geoMeans=geoMeans)

 

ADD COMMENTlink modified 2.7 years ago • written 3.6 years ago by Michael Love17k
1

Better is to use exp(sum(log(row[row != 0]))/length(row)). This came up through evolving use cases in the phyloseq package by Paul (Joey) McMurdie. I'm now adding this alternate estimator as an official "type" to the estimateSizeFactors function, so that it can be documented.

ADD REPLYlink written 14 months ago by Michael Love17k
0
gravatar for mfarhi
3.6 years ago by
mfarhi10
United States
mfarhi10 wrote:

Hi Michael

Thank you very much for the reply, and fix. I do not have access to my data now but will try give a try next week when I'm back

Moran 

ADD COMMENTlink written 3.6 years ago by mfarhi10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 331 users visited in the last hour