DESeq2 Error: rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest
2
4
Entering edit mode
erin.gill81 ▴ 60
@eringill81-6831
Last seen 4.7 years ago
Canada

Hello,

I am trying to analyse a dataset that has data from three different sequencing runs, two cell types (one is a homozygous knockout of the other) and four stimuli (plus control). I know from previous analyses of this data that there are batch effects, so I am using a design formula that looks like the following:

design = ~ run + celltype + stimulus + celltype:stimulus

When I attempt to run DESeq2 using this design formula, I get the error "1650 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest"

I don't get the error when I combine celltype and stimulus into a single variable "condition".

I've searched for this error in the manual and online, but cannot figure out what it means. Could somebody please explain it?

Thanks!

Erin

 

 

output of sessionInfo(): 

R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] DESeq2_1.6.3            RcppArmadillo_0.4.600.0 Rcpp_0.11.3            
[4] GenomicRanges_1.18.4    GenomeInfoDb_1.2.4      IRanges_2.0.1          
[7] S4Vectors_0.4.0         BiocGenerics_0.12.1    

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3      annotate_1.44.0      AnnotationDbi_1.28.1
 [4] base64enc_0.1-2      BatchJobs_1.5        BBmisc_1.8          
 [7] Biobase_2.26.0       BiocParallel_1.0.0   brew_1.0-6          
[10] checkmate_1.5.1      cluster_2.0.1        codetools_0.2-10    
[13] colorspace_1.2-4     DBI_0.3.1            digest_0.6.8        
[16] fail_1.2             foreach_1.4.2        foreign_0.8-62      
[19] Formula_1.2-0        genefilter_1.48.1    geneplotter_1.44.0  
[22] ggplot2_1.0.0        grid_3.1.2           gtable_0.1.2        
[25] Hmisc_3.14-6         iterators_1.0.7      lattice_0.20-29     
[28] latticeExtra_0.6-26  locfit_1.5-9.1       MASS_7.3-37         
[31] munsell_0.4.2        nnet_7.3-9           plyr_1.8.1          
[34] proto_0.3-10         RColorBrewer_1.1-2   reshape2_1.4.1      
[37] rpart_4.1-9          RSQLite_1.0.0        scales_0.2.4        
[40] sendmailR_1.2-1      splines_3.1.2        stringr_0.6.2       
[43] survival_2.37-7      tools_3.1.2          XML_3.98-1.1        
[46] xtable_1.7-4         XVector_0.6.0  

 

deseq2 • 12k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 14 hours ago
United States

hi Erin,

Can you show what the column data looks like? It might be that there is some near-collinearity which poses a problem for the fitting algorithm. Also can you provide all the code you are using for construction and running DESeq2?

ADD COMMENT
0
Entering edit mode
erin.gill81 ▴ 60
@eringill81-6831
Last seen 4.7 years ago
Canada

The code that I'm using is as follows:

#start R

R

#read in new sample sheet 
samples = read.csv("samples.csv",header=TRUE)

#read in count matrix
counts = read.delim("sumtechreps.csv", col.names=samples$techreps)

#attach the column data to from the sample sheet to the variable coldata
coldata = with(samples,data.frame(shortname = I(shortname), celltype = celltype, run = run, stimulus = stimulus))

#attach the count data to the variable countdata
countdata = counts

#start DESeq2
library("DESeq2")

#construct your DESeq2 data set, making sure to specify the design matrix here
dds <- DESeqDataSetFromMatrix(countData = countdata,
colData = coldata,
design = ~ run + celltype + stimulus + celltype:stimulus)

#relevel controls
dds$celltype <- relevel(dds$celltype, "E")
dds$stimulus <- relevel(dds$stimulus, "C4")

#run DESeq2 on the dataset
dds <- DESeq(dds)

 

Here is what coldata looks like:

      shortname celltype run stimulus
1  2_E14-Co....        E  r2       C4
2     2_E14-CpG        E  r2      CpG
3  2_E14-TRIDAP        E  r2        T
4     2_E14-LPS        E  r2        L
5  2_E14-In....        E  r2      I24
6  2_IRAK1-....        I  r2       C4
7   2_IRAK1-CpG        I  r2      CpG
8  2_IRAK1-....        I  r2        T
9  2_IRAK1-....        I  r2      I24
10  2_IRAK1-LPS        I  r2        L
11    3_E14-CpG        E  r3      CpG
12 3_E14-TRIDAP        E  r3        T
13 3_E14-Co....        E  r3       C4
14    3_E14-LPS        E  r3        L
15 3_E14-In....        E  r3      I24
16  3_IRAK1-CpG        I  r3      CpG
17 3_IRAK1-....        I  r3        T
18 3_IRAK1-....        I  r3       C4
19  3_IRAK1-LPS        I  r3        L
20 3_IRAK1-....        I  r3      I24
21 1_E14-Co....        E  r1       C4
22    1_E14-LPS        E  r1        L
23    1_E14-CpG        E  r1      CpG
24 1_E14-TRIDAP        E  r1        T
25 1_Influe....        E  r1      I24
26  1_IRAK1-LPS        I  r1        L
27  1_IRAK1-CpG        I  r1      CpG
28 1_IRAK1-....        I  r1        T
29 1_IRAK1-....        I  r1       C4
30 1_IRAK1-....        I  r1      I24

 

 

ADD COMMENT
2
Entering edit mode

Well there's nothing wrong with the design, it's perfectly balanced. Sometimes the GLM code can have an issue converging when there is a single count in a row of all 0's and there are many interactions terms. Can you try first subsetting out the genes with small counts? We don't have this kind of code in the manual, because these rows are omitted from multiple test correction automatically, but it is a perfectly valid filtering step. Some code which says we want a normalized count of at least 10 in two or more samples:

dds <- estimateSizeFactors(dds)
nc <- counts(dds, normalized=TRUE)
filter <- rowSums(nc >= 10) >= 2
dds <- dds[filter,]

...then run DESeq()

Hopefully this would eliminate the convergence problems. If not, ping this thread and I will take a look.

ADD REPLY
0
Entering edit mode

This has ameliorated the problem, but not completely solved it. Now instead of having 1650 rows not converge, I am left with 186 rows that don't converge. I also tried omitting genes with less than 15 counts in two or more samples. This gives me 131 rows that don't converge.

ADD REPLY
1
Entering edit mode

Some more options, try increasing the 2 sample requirement to 3 or 4.

To deal with any remaining rows, you can either omit them from the results step: ddsClean <- dds[which(mcols(dds)$betaConv),], because these are typically genes with very small counts and little power, or you can increase the maximum iterations with the following code. Instead of running DESeq(), run:

dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds, maxit=500)
ADD REPLY
0
Entering edit mode

Thank you so much for your help. A combination of adding the filtering step and increasing the iterations have fixed the problem.

ADD REPLY

Login before adding your answer.

Traffic: 827 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6