Hello,
I am trying to analyse a dataset that has data from three different sequencing runs, two cell types (one is a homozygous knockout of the other) and four stimuli (plus control). I know from previous analyses of this data that there are batch effects, so I am using a design formula that looks like the following:
design = ~ run + celltype + stimulus + celltype:stimulus
When I attempt to run DESeq2 using this design formula, I get the error "1650 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest"
I don't get the error when I combine celltype and stimulus into a single variable "condition".
I've searched for this error in the manual and online, but cannot figure out what it means. Could somebody please explain it?
Thanks!
Erin
output of sessionInfo():
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=en_CA.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] DESeq2_1.6.3 RcppArmadillo_0.4.600.0 Rcpp_0.11.3
[4] GenomicRanges_1.18.4 GenomeInfoDb_1.2.4 IRanges_2.0.1
[7] S4Vectors_0.4.0 BiocGenerics_0.12.1
loaded via a namespace (and not attached):
[1] acepack_1.3-3.3 annotate_1.44.0 AnnotationDbi_1.28.1
[4] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8
[7] Biobase_2.26.0 BiocParallel_1.0.0 brew_1.0-6
[10] checkmate_1.5.1 cluster_2.0.1 codetools_0.2-10
[13] colorspace_1.2-4 DBI_0.3.1 digest_0.6.8
[16] fail_1.2 foreach_1.4.2 foreign_0.8-62
[19] Formula_1.2-0 genefilter_1.48.1 geneplotter_1.44.0
[22] ggplot2_1.0.0 grid_3.1.2 gtable_0.1.2
[25] Hmisc_3.14-6 iterators_1.0.7 lattice_0.20-29
[28] latticeExtra_0.6-26 locfit_1.5-9.1 MASS_7.3-37
[31] munsell_0.4.2 nnet_7.3-9 plyr_1.8.1
[34] proto_0.3-10 RColorBrewer_1.1-2 reshape2_1.4.1
[37] rpart_4.1-9 RSQLite_1.0.0 scales_0.2.4
[40] sendmailR_1.2-1 splines_3.1.2 stringr_0.6.2
[43] survival_2.37-7 tools_3.1.2 XML_3.98-1.1
[46] xtable_1.7-4 XVector_0.6.0
Well there's nothing wrong with the design, it's perfectly balanced. Sometimes the GLM code can have an issue converging when there is a single count in a row of all 0's and there are many interactions terms. Can you try first subsetting out the genes with small counts? We don't have this kind of code in the manual, because these rows are omitted from multiple test correction automatically, but it is a perfectly valid filtering step. Some code which says we want a normalized count of at least 10 in two or more samples:
...then run DESeq()
Hopefully this would eliminate the convergence problems. If not, ping this thread and I will take a look.
This has ameliorated the problem, but not completely solved it. Now instead of having 1650 rows not converge, I am left with 186 rows that don't converge. I also tried omitting genes with less than 15 counts in two or more samples. This gives me 131 rows that don't converge.
Some more options, try increasing the 2 sample requirement to 3 or 4.
To deal with any remaining rows, you can either omit them from the results step:
ddsClean <- dds[which(mcols(dds)$betaConv),]
, because these are typically genes with very small counts and little power, or you can increase the maximum iterations with the following code. Instead of running DESeq(), run:Thank you so much for your help. A combination of adding the filtering step and increasing the iterations have fixed the problem.