DESeq2: What is the impact if glm.fit does not converge during rlogTransformation or varianceStabilizingTransformation
2
0
Entering edit mode
btemperton • 0
@btemperton-6804
Last seen 6.9 years ago
United Kingdom

Hi there,

I am currently dealing with some pilot study data from a 16S rRNA survey from sediments. The data consists of 24 sites (no replication) of MiSeq 16S amplicon data processed through Mothur and clustered into ~227,216 OTUs. Based on the paper by McMurdie and Holmes (2014) on the inadmissability of rarefying data, it was decided to run the raw OTU counts through DESeq2 to normalize the data.

However, running rlogTransformation on the DESeqDataSet (dds) results in a warning message that the glm.fit algorithm did not converge. The same is also true if using the varianceStabiliziingTransformation function (although in our case rlog is more appropriate). Both methods are being run with blind=TRUE as is appropriate at this stage. My question is: What is the effect on normalization if the glm.fit does not converge and what are the likely causes of non-convergence?

Many thanks,

Ben

---------------------------------------- Commands ----------------------------------------

> library('DESeq2')
>
>
> #Set up the experimental design matrix
> site_data<-as.data.frame(as.character(1:ncol(count_data)), row.names=colnames(count_data))
> colnames(site_data)<-'site_type'
>
> #Perform Deseq2 magic
> dds<-DESeqDataSetFromMatrix(countData=count_data, colData=site_data, design= ~site_type)
Usage note: the following factors have 3 or more levels:

site_type

For DESeq2 versions < 1.3, if you plan on extracting results for
these factors, we recommend using betaPrior=FALSE as an argument
when calling DESeq().
As currently implemented in version 1.2, the log2 fold changes can
vary if the base level is changed, when extracting results for a
factor with 3 or more levels. A solution will be implemented in
version 1.3 which allows for the use of a beta prior and symmetric
log2 fold change estimates regardless of the selection of base level.
> dds$site_type<-factor(dds$site_type, levels=as.character(1:ncol(count_data)))
> #Look at the normalized values. Because the size factors of the datasets are considerable, we will use
> # a regularized log transformation instead of a variance stabilizing transformation.
> rld<-rlogTransformation(dds)
Warning message:
glm.fit: algorithm did not converge

---------------------------------------- Session Info ----------------------------------------

R version 3.0.2 (2013-09-25)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8
[6] LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] DESeq2_1.2.10             RcppArmadillo_0.4.450.1.0 Rcpp_0.11.3               GenomicRanges_1.14.4      XVector_0.2.0
[6] IRanges_1.20.7            BiocGenerics_0.8.0

loaded via a namespace (and not attached):
[1] annotate_1.40.1      AnnotationDbi_1.24.0 Biobase_2.22.0       DBI_0.3.1            genefilter_1.44.0    grid_3.0.2
[7] lattice_0.20-29      locfit_1.5-9.1       RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.0.2        stats4_3.0.2
[13] survival_2.37-7      tools_3.0.2          XML_3.98-1.1         xtable_1.7-4

deseq2 normalization microbiome • 4.0k views
1
Entering edit mode
@mikelove
Last seen 8 hours ago
United States

There's actually not a problem here, you can use either transformation without concern.

That message is a bit confusing, and it was fixed in the current release (which is DESeq2 1.4, released in Spring 2014).

The message is actually coming from one iteration of the parametric dispersion fit, which also involves fitting a GLM of the dispersions over the mean. So one iteration of the fitting of the dispersion trend didn't converge, but then there is no other note, so the final iteration did converge.

0
Entering edit mode
Simon Anders ★ 3.7k
@simon-anders-3855
Last seen 18 months ago
Zentrum für Molekularbiologie, Universi…

The 'rlogTranform' fits one GLM  for each gene (or, in your case, each OTU). It seems that the fit failed for just one of your 200,000 OTUs, which should be harmless.

A few check whether the rlog transformation worked well are always in order, through. For example, you coulde plot for oneor some of your samples normal log against rlog:

plot( log2( counts( dds, normalized=TRUE )[,1], rld[,1] ), assay(rld)[,1] )

You should get a cloud around a bend line which turns into a straight diagonal for high counts.