Search
Question: DESeq2 Error: rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest
1
gravatar for erin.gill81
2.8 years ago by
erin.gill8130
Canada
erin.gill8130 wrote:

Hello,

I am trying to analyse a dataset that has data from three different sequencing runs, two cell types (one is a homozygous knockout of the other) and four stimuli (plus control). I know from previous analyses of this data that there are batch effects, so I am using a design formula that looks like the following:

design = ~ run + celltype + stimulus + celltype:stimulus

When I attempt to run DESeq2 using this design formula, I get the error "1650 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest"

I don't get the error when I combine celltype and stimulus into a single variable "condition".

I've searched for this error in the manual and online, but cannot figure out what it means. Could somebody please explain it?

Thanks!

Erin

 

 

output of sessionInfo(): 

R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] DESeq2_1.6.3            RcppArmadillo_0.4.600.0 Rcpp_0.11.3            
[4] GenomicRanges_1.18.4    GenomeInfoDb_1.2.4      IRanges_2.0.1          
[7] S4Vectors_0.4.0         BiocGenerics_0.12.1    

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.1
 [4] base64enc_0.1-2      BatchJobs_1.5        BBmisc_1.8          
 [7] Biobase_2.26.0       BiocParallel_1.0.0   brew_1.0-6          
[10] checkmate_1.5.1      cluster_2.0.1        codetools_0.2-10    
[13] colorspace_1.2-4     DBI_0.3.1            digest_0.6.8        
[16] fail_1.2             foreach_1.4.2        foreign_0.8-62      
[19] Formula_1.2-0        genefilter_1.48.1    geneplotter_1.44.0  
[22] ggplot2_1.0.0        grid_3.1.2           gtable_0.1.2        
[25] Hmisc_3.14-6         iterators_1.0.7      lattice_0.20-29     
[28] latticeExtra_0.6-26  locfit_1.5-9.1       MASS_7.3-37         
[31] munsell_0.4.2        nnet_7.3-9           plyr_1.8.1          
[34] proto_0.3-10         RColorBrewer_1.1-2   reshape2_1.4.1      
[37] rpart_4.1-9          RSQLite_1.0.0        scales_0.2.4        
[40] sendmailR_1.2-1      splines_3.1.2        stringr_0.6.2       
[43] survival_2.37-7      tools_3.1.2          XML_3.98-1.1        
[46] xtable_1.7-4         XVector_0.6.0  

 

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by erin.gill8130
0
gravatar for Michael Love
2.8 years ago by
Michael Love15k
United States
Michael Love15k wrote:

hi Erin,

Can you show what the column data looks like? It might be that there is some near-collinearity which poses a problem for the fitting algorithm. Also can you provide all the code you are using for construction and running DESeq2?

ADD COMMENTlink written 2.8 years ago by Michael Love15k
0
gravatar for erin.gill81
2.8 years ago by
erin.gill8130
Canada
erin.gill8130 wrote:

The code that I'm using is as follows:

#start R

R

#read in new sample sheet 
samples = read.csv("samples.csv",header=TRUE)

#read in count matrix
counts = read.delim("sumtechreps.csv", col.names=samples$techreps)

#attach the column data to from the sample sheet to the variable coldata
coldata = with(samples,data.frame(shortname = I(shortname), celltype = celltype, run = run, stimulus = stimulus))

#attach the count data to the variable countdata
countdata = counts

#start DESeq2
library("DESeq2")

#construct your DESeq2 data set, making sure to specify the design matrix here
dds <- DESeqDataSetFromMatrix(countData = countdata,
colData = coldata,
design = ~ run + celltype + stimulus + celltype:stimulus)

#relevel controls
dds$celltype <- relevel(dds$celltype, "E")
dds$stimulus <- relevel(dds$stimulus, "C4")

#run DESeq2 on the dataset
dds <- DESeq(dds)

 

Here is what coldata looks like:

      shortname celltype run stimulus
1  2_E14-Co....        E  r2       C4
2     2_E14-CpG        E  r2      CpG
3  2_E14-TRIDAP        E  r2        T
4     2_E14-LPS        E  r2        L
5  2_E14-In....        E  r2      I24
6  2_IRAK1-....        I  r2       C4
7   2_IRAK1-CpG        I  r2      CpG
8  2_IRAK1-....        I  r2        T
9  2_IRAK1-....        I  r2      I24
10  2_IRAK1-LPS        I  r2        L
11    3_E14-CpG        E  r3      CpG
12 3_E14-TRIDAP        E  r3        T
13 3_E14-Co....        E  r3       C4
14    3_E14-LPS        E  r3        L
15 3_E14-In....        E  r3      I24
16  3_IRAK1-CpG        I  r3      CpG
17 3_IRAK1-....        I  r3        T
18 3_IRAK1-....        I  r3       C4
19  3_IRAK1-LPS        I  r3        L
20 3_IRAK1-....        I  r3      I24
21 1_E14-Co....        E  r1       C4
22    1_E14-LPS        E  r1        L
23    1_E14-CpG        E  r1      CpG
24 1_E14-TRIDAP        E  r1        T
25 1_Influe....        E  r1      I24
26  1_IRAK1-LPS        I  r1        L
27  1_IRAK1-CpG        I  r1      CpG
28 1_IRAK1-....        I  r1        T
29 1_IRAK1-....        I  r1       C4
30 1_IRAK1-....        I  r1      I24

 

 

ADD COMMENTlink written 2.8 years ago by erin.gill8130

Well there's nothing wrong with the design, it's perfectly balanced. Sometimes the GLM code can have an issue converging when there is a single count in a row of all 0's and there are many interactions terms. Can you try first subsetting out the genes with small counts? We don't have this kind of code in the manual, because these rows are omitted from multiple test correction automatically, but it is a perfectly valid filtering step. Some code which says we want a normalized count of at least 10 in two or more samples:

dds <- estimateSizeFactors(dds)
nc <- counts(dds, normalized=TRUE)
filter <- rowSums(nc >= 10) >= 2
dds <- dds[filter,]

...then run DESeq()

Hopefully this would eliminate the convergence problems. If not, ping this thread and I will take a look.

ADD REPLYlink written 2.8 years ago by Michael Love15k

This has ameliorated the problem, but not completely solved it. Now instead of having 1650 rows not converge, I am left with 186 rows that don't converge. I also tried omitting genes with less than 15 counts in two or more samples. This gives me 131 rows that don't converge.

ADD REPLYlink written 2.8 years ago by erin.gill8130

Some more options, try increasing the 2 sample requirement to 3 or 4.

To deal with any remaining rows, you can either omit them from the results step: ddsClean <- dds[which(mcols(dds)$betaConv),], because these are typically genes with very small counts and little power, or you can increase the maximum iterations with the following code. Instead of running DESeq(), run:

dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds, maxit=500)
ADD REPLYlink written 2.8 years ago by Michael Love15k

Thank you so much for your help. A combination of adding the filtering step and increasing the iterations have fixed the problem.

ADD REPLYlink written 2.7 years ago by erin.gill8130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 145 users visited in the last hour