Question

DESeq2: What is the impact if glm.fit does not converge during rlogTransformation or varianceStabilizingTransformation

0

Entering edit mode

btemperton • 0

@btemperton-6804

Last seen 9.1 years ago

United Kingdom

Hi there,

I am currently dealing with some pilot study data from a 16S rRNA survey from sediments. The data consists of 24 sites (no replication) of MiSeq 16S amplicon data processed through Mothur and clustered into ~227,216 OTUs. Based on the paper by McMurdie and Holmes (2014) on the inadmissability of rarefying data, it was decided to run the raw OTU counts through DESeq2 to normalize the data.

However, running rlogTransformation on the DESeqDataSet (dds) results in a warning message that the glm.fit algorithm did not converge. The same is also true if using the varianceStabiliziingTransformation function (although in our case rlog is more appropriate). Both methods are being run with blind=TRUE as is appropriate at this stage. My question is: What is the effect on normalization if the glm.fit does not converge and what are the likely causes of non-convergence?

Many thanks,

Ben

---------------------------------------- Commands ----------------------------------------

> library('DESeq2')
> #Load the data
> count_data<-read.delim('./data/allOTU_counts.reformat.txt', header=T, row.names=1)
>
>
> #Set up the experimental design matrix
> site_data<-as.data.frame(as.character(1:ncol(count_data)), row.names=colnames(count_data))
> colnames(site_data)<-'site_type'
>
> #Perform Deseq2 magic
> dds<-DESeqDataSetFromMatrix(countData=count_data, colData=site_data, design= ~site_type)
Usage note: the following factors have 3 or more levels:

site_type

For DESeq2 versions < 1.3, if you plan on extracting results for
these factors, we recommend using betaPrior=FALSE as an argument
when calling DESeq().
As currently implemented in version 1.2, the log2 fold changes can
vary if the base level is changed, when extracting results for a
factor with 3 or more levels. A solution will be implemented in
version 1.3 which allows for the use of a beta prior and symmetric
log2 fold change estimates regardless of the selection of base level.
> dds$site_type<-factor(dds$site_type, levels=as.character(1:ncol(count_data)))
> #Look at the normalized values. Because the size factors of the datasets are considerable, we will use
> # a regularized log transformation instead of a variance stabilizing transformation.
> rld<-rlogTransformation(dds)
Warning message:
glm.fit: algorithm did not converge

---------------------------------------- Session Info ----------------------------------------

R version 3.0.2 (2013-09-25)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8
[6] LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] DESeq2_1.2.10 RcppArmadillo_0.4.450.1.0 Rcpp_0.11.3 GenomicRanges_1.14.4 XVector_0.2.0
[6] IRanges_1.20.7 BiocGenerics_0.8.0

loaded via a namespace (and not attached):
[1] annotate_1.40.1      AnnotationDbi_1.24.0 Biobase_2.22.0       DBI_0.3.1            genefilter_1.44.0    grid_3.0.2
[7] lattice_0.20-29      locfit_1.5-9.1       RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.0.2        stats4_3.0.2
[13] survival_2.37-7      tools_3.0.2          XML_3.98-1.1         xtable_1.7-4

deseq2 normalization microbiome • 4.6k views

ADD COMMENT • link updated 9.6 years ago by Michael Love 41k • written 9.6 years ago by btemperton • 0

score 1 · Answer 1 · 2014-10-02

There's actually not a problem here, you can use either transformation without concern.

That message is a bit confusing, and it was fixed in the current release (which is DESeq2 1.4, released in Spring 2014).

The message is actually coming from one iteration of the parametric dispersion fit, which also involves fitting a GLM of the dispersions over the mean. So one iteration of the fitting of the dispersion trend didn't converge, but then there is no other note, so the final iteration did converge.

score 0 · Answer 2 · 2014-10-02

The 'rlogTranform' fits one GLM for each gene (or, in your case, each OTU). It seems that the fit failed for just one of your 200,000 OTUs, which should be harmless.

A few check whether the rlog transformation worked well are always in order, through. For example, you coulde plot for oneor some of your samples normal log against rlog:

plot( log2( counts( dds, normalized=TRUE )[,1], rld[,1] ), assay(rld)[,1] )

You should get a cloud around a bend line which turns into a straight diagonal for high counts.