DESeq warning: "Dispersion fit did not converge" with replicates counts data
1
0
Entering edit mode
Jiewencai ▴ 20
@jiewencai-6258
Last seen 9.6 years ago
Hi, I have 4 sample of RNA-Seq read counts data, The data are like below: gene_id CK-1 CK-2 T-1 T-2 isoform1 66 78 133 150 isoform2 2106 1764 1965 1894 isoform3 17452 19469 21509 27042 isoform4 156 165 7 9 where CK-1 and CK-2 are replicates of the one condition, T-1 and T-2 are replicates of the other condition. When I try to find differential expression isoforms by DESeq or DESeq2, I got the Warning: "Dispersion fit did not converge". Simon said when analysis data without replicates would arise this warning. But my data do have replicates, Why this warning occurred ? Is there any problems with my count data ? Would this warning reduce my result's accuracy ? Any help is appreciated ! Best, Wencai Jie ########################### DESeq commands ####################################### library("DESeq") condition = factor(c("CK","CK","Treated","Treated")) countTable = read.table("counts.txt",header=T,row.name=1) cds = newCountDataSet(countTable,condition) cds = estimateSizeFactors(cds) cds = estimateDispersions(cds) Warning: In parametricDispersionFit(means, disps) : Dispersion fit did not converge. ########################### DESeq2 commands ####################################### library("DESeq2") countData = read.table("counts.txt",header=T,row.names=1) colData = DataFrame(condition=factor(c("CK","CK","Treated","Treated"))) dds = DESeqDataSetFromMatrix(countData=countData,colData=colData,design =~ condition) dds = DESeq(dds) Warning: In parametricDispersionFit(mcols(objectNZ)$baseMean[useForFit], : dispersion fit did not converge ###################################################################### ############## sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Chinese (Simplified)_China.936 [2] LC_CTYPE=Chinese (Simplified)_China.936 [3] LC_MONETARY=Chinese (Simplified)_China.936 [4] LC_NUMERIC=C [5] LC_TIME=Chinese (Simplified)_China.936 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] DESeq2_1.0.19 RcppArmadillo_0.3.920.3 Rcpp_0.10.6 [4] lattice_0.20-24 Biobase_2.20.1 GenomicRanges_1.12.5 [7] IRanges_1.18.4 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationDbi_1.22.6 DBI_0.2-7 [4] genefilter_1.42.0 grid_3.0.1 locfit_1.5-9.1 [7] RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.1 [10] stats4_3.0.1 survival_2.37-4 XML_3.98-1.1 [13] xtable_1.7-1 [[alternative HTML version deleted]]
DESeq DESeq2 DESeq DESeq2 • 2.7k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 11 hours ago
United States
hi Jie, The warning continues in DESeq2, saying that a local fit was substituted for the parametric fit, and to check the plot of dispersion estimates using plotDispEsts() for quality of the local regression curve. I should probably extend the text of the warning, because users seem to get stuck here thinking that they cannot continue. It means that the software tried the default method, but could not fit the parametric curve through the dispersion estimates over the mean counts. The parametric function works well for most but not all RNA-Seq experiments. You should then follow the instructions of the next line of the warning and check the plot to make sure that the local regression curve is not being overly influenced by outlier points. If it seems a good fit, then you can continue with the analysis as is. if not, then you should instead use fitType="mean" with DESeq(). Mike On Mon, Nov 25, 2013 at 9:52 AM, Jiewencai <jiewencai@qq.com> wrote: > Hi, > > > I have 4 sample of RNA-Seq read counts data, The data are like below: > > > gene_id CK-1 CK-2 T-1 T-2 > isoform1 66 78 133 150 > isoform2 2106 1764 1965 1894 > isoform3 17452 19469 21509 27042 > isoform4 156 165 7 9 > > > where CK-1 and CK-2 are replicates of the one condition, > T-1 and T-2 are replicates of the other condition. > > > When I try to find differential expression isoforms by DESeq or DESeq2, > I got the Warning: "Dispersion fit did not converge". > > > Simon said when analysis data without replicates would arise this warning. > But my data do have replicates, Why this warning occurred ? Is there any > problems > with my count data ? Would this warning reduce my result's accuracy ? > > > Any help is appreciated ! > > > Best, > Wencai Jie > > > ########################### DESeq commands > ####################################### > library("DESeq") > condition = factor(c("CK","CK","Treated","Treated")) > countTable = read.table("counts.txt",header=T,row.name=1) > cds = newCountDataSet(countTable,condition) > cds = estimateSizeFactors(cds) > cds = estimateDispersions(cds) > > > Warning: > In parametricDispersionFit(means, disps) : Dispersion fit did not converge. > > > ########################### DESeq2 commands > ####################################### > library("DESeq2") > countData = read.table("counts.txt",header=T,row.names=1) > colData = DataFrame(condition=factor(c("CK","CK","Treated","Treated"))) > dds = DESeqDataSetFromMatrix(countData=countData,colData=colData,design =~ > condition) > dds = DESeq(dds) > > > Warning: > In parametricDispersionFit(mcols(objectNZ)$baseMean[useForFit], : > dispersion fit did not converge > > > > > > #################################################################### ################ > sessionInfo() > > > R version 3.0.1 (2013-05-16) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > > locale: > [1] LC_COLLATE=Chinese (Simplified)_China.936 > [2] LC_CTYPE=Chinese (Simplified)_China.936 > [3] LC_MONETARY=Chinese (Simplified)_China.936 > [4] LC_NUMERIC=C > [5] LC_TIME=Chinese (Simplified)_China.936 > > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > > other attached packages: > [1] DESeq2_1.0.19 RcppArmadillo_0.3.920.3 Rcpp_0.10.6 > [4] lattice_0.20-24 Biobase_2.20.1 GenomicRanges_1.12.5 > [7] IRanges_1.18.4 BiocGenerics_0.6.0 > > > loaded via a namespace (and not attached): > [1] annotate_1.38.0 AnnotationDbi_1.22.6 DBI_0.2-7 > [4] genefilter_1.42.0 grid_3.0.1 locfit_1.5-9.1 > [7] RColorBrewer_1.0-5 RSQLite_0.11.4 splines_3.0.1 > [10] stats4_3.0.1 survival_2.37-4 XML_3.98-1.1 > [13] xtable_1.7-1 > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Mike, Thanks for your help. Do you mean "plotDispEsts", Maybe it's a good fit, but I am not sure. I attach the counts file to the email, Could you help me to make the judgment? And, could you explain why the DESeq fail to fit the parametric curve through the dispersion estimates over the mean counts. Is't because the counts are too much difference between the two conditions? There are about 9% of isoforms's log2FC great than 2 or less than -2 between CK and T. Wencai
ADD REPLY

Login before adding your answer.

Traffic: 569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6