Question: DEXSeq update results change
0
4.4 years ago by
Germany
Dear Alejandro, I just wanted to follow up on this to say that I also see quite a big difference between the "new" and old "DEXseq". Not only are the numbers of differentially expressed exons much larger in the new version (in one experiment they nearly quadrupled), the direction of change is now shifted. That is, when upon knock-down there was about 50% more exon exclusion then inclusion, now is the other way around. It does not happen in all my knockdowns (and I have seven of them) but it is sufficient to me wary of previous conclusions based on the old version. As before, DEXSeq was run with the default options. Perhaps my experimental design is not the best to make a conclusion on how much different the results are between the 2 versions of DEXSeq (only 2 biological replicates per condition), but other users should bare in mind that some changes in results might happen. Regarding my experimental design, I am building the DEXSeqDataSet object with only 2 conditions (4 samples) to do the pairwise comparisons. Since I have a control and 7 conditions, is it possible, similarly to DESeq2, to build the object, estimate the dispersion, and do the comparisons with all the samples and then only extract the results of the comparisons of interest? And if so, does it offer an statistical advantage? My gut feeling says yes but it says many wrong things all time :) (I am attaching a dispersion plot from on comparison for DEXSeq 1.10, sessionInfo is at the bottom of the email) On a matter of packages changes, and I put this question to discussion on the list, where should the threshold be for a change in a package to warrant also a change in name? Changes in function wrappers, bug corrections are all fine, but when the results stop being reproducible (and not due to bug fixing), should it be time to think about it? We have seen it happening with DESeq which after major changes became DESeq2. This is not a dig at you, just genuine curiosity, and concern as user. Best, Ant?nio > Dear Marco Marconi, > > I think that was the version where we changed from our original method, > the one described on the paper to the recent apporach, you fill find > this details in the section "Methodological changes since publication of > the paper". As you might have noticed, the dispersions are very > correlated as well as the p-values. > > I don't think the change in the p-value, and therefore the p-adjusted > value, since it is not changing dramatically. The simplest thing would > be to increase your FDR threshold a bit. > > Best regards, > Alejandro > > >/ Hello, After performing a general Bioconductor update to the new version, I > />/ noticed that now the DEXseq package 1.8.0 is giving me different results > />/ from prrevious version 1.6.0. As a start, its function print dots "..." on > />/ the stdout which was not done in the previous version. This is not a big > />/ issue, the problem is that now I am obtaining different results. Generally, > />/ the padjust values are bigger. > />/ > />/ For example this exon: > />/ > />/ a1 a2 a3 b1 b2 b3 > />/ EXXXX 126 90 101 81 233 225 > />/ > />/ gets different results: > />/ > />/ geneID,exonID,dispersion,pvalue,padjust,meanBase,log2fold(b/a) > />/ > />/ old version: > />/ EXXXX,0.0684906370633231,0.00256847378387803,0.0321347815544768 ,129.941383199307,-0.217272839643456 > />/ > />/ new version: > />/ EXXXX,0.0928452378435829,0.00401881761350959,0.0587521235795571 ,129.941383199307,-0.213275654796358 > />/ > />/ > />/ as you can see the old one has a padjust below 0.05 and the other above > />/ 0.05, which is a big problem. > />/ > />/ > />/ I had a look in the NEWS section of the DEXSeq package, but i couldn't find > />/ any information about major changes. > />/ > />/ > />/ thank you very much, regards, > /> sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] ggplot2_1.0.0 plyr_1.8.1 DEXSeq_1.10.8 BiocParallel_0.6.1 DESeq2_1.4.5 RcppArmadillo_0.4.320.0 [7] Rcpp_0.11.2 GenomicRanges_1.16.2 GenomeInfoDb_1.0.2 IRanges_1.21.43 Biobase_2.24.0 BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] annotate_1.42.1 AnnotationDbi_1.26.0 BatchJobs_1.3 BBmisc_1.7 biomaRt_2.20.0 Biostrings_2.32.0 [7] bitops_1.0-6 brew_1.0-6 checkmate_1.2 codetools_0.2-8 colorspace_1.2-4 DBI_0.2-7 [13] digest_0.6.4 fail_1.2 foreach_1.4.2 genefilter_1.46.1 geneplotter_1.42.0 grid_3.1.1 [19] gtable_0.1.2 hwriter_1.3 iterators_1.0.7 lattice_0.20-29 locfit_1.5-9.1 MASS_7.3-33 [25] munsell_0.4.2 proto_0.3-10 RColorBrewer_1.0-5 RCurl_1.95-4.1 reshape2_1.4 Rsamtools_1.16.0 [31] RSQLite_0.11.4 scales_0.2.4 sendmailR_1.1-2 splines_3.1.1 statmod_1.4.20 stats4_3.1.1 [37] stringr_0.6.2 survival_2.37-7 tools_3.1.1 XML_3.98-1.1 xtable_1.7-3 XVector_0.4.0 [43] zlibbioc_1.10.0 -- Ant?nio Miguel de Jesus Domingues, PhD Postdoctoral researcher Deep Sequencing Group - SFB655 Biotechnology Center (Biotec) Technische Universit?t Dresden Fetscherstra?e 105 01307 Dresden Phone: +49 (351) 458 82362 Email: antonio.domingues(at)biotec.tu-dresden.de -- The Unbearable Lightness of Molecular Biology -------------- next part -------------- A non-text attachment was scrubbed... Name: 3_c_fitDispersion.png Type: image/png Size: 66043 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20140815="" fab7dce9="" attachment.png="">
dexseq deseq sequencing • 661 views
modified 4.4 years ago • written 4.4 years ago by António Miguel de Jesus Domingues430
0
4.4 years ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:
Dear Wolfgang and Alejandro, First of all, thank you for looking into this. can you send one ore more specific examples, i.e. > - the count table for the affected gene(s), for all its exons, and/or the > plotDEXSeq output > - the size factorss > I have prepared a data set+script for testing that will follow in a separate private email, so that you can look into this in detail. While preparing it I think I spotted where the difference in results might originate *(1)*. Let me clarify that my concern is not with a particular exon, but rather with the general trend (ratio of up-regulated / down-regulated exons) that is changed, particularly in the experimental set-up I am sending you. That also leads to the second point - with only two replicates per > condition, expectations about reproducibility of the result should be > modest. No amount of statistical software can undo that. > I am well aware of that :) In defence of data, I should say that the experimental validation of the DGE results (for this same data) was nearly 100%. So yes, few replicates can be an issue, but we have some experimental validation to give us assurance that not all is bad. @ Alejandro > Just an additional question, do you see the shift in fold changes for all > your exons or only for a subset of them? > In older versions there was a bug that was causing some label swaps in the > result columns, but this should be fixed in the most recent versions (I > just want to make sure it is fixed!). As Wolfgang mentions, this would > become evident by looking at the plotDEXSeq output (by looking at the > normalized counts and exon usage). > The scatter plot of fold change of new vs old version is a bit funky I must say: https://www.dropbox.com/s/l3snr4epgwbkty8/foldchange_comparison.png *(1) * while playing with the example data to send you, I noticed what could be an explanation while counting significantly changed exons: https://www.dropbox.com/s/7zc4n352ftjzqqe/nHits_comparison.pdf In the old version of DEXseq without a fold-change cut-off, there are more exons with decreased inclusion than with increased inclusion (~2500/1500 exons). With increasingly higher fold-change cut-offs this is inverted. For instance with fc 10% is 2000/1500, and with fc of 50% is 80/400. So a completely different trend. Using the new DEXSeq version, changing the FC cut-off makes no difference: the trend is always more exons with increased inclusion, which is sort of what I would expect. Could it be that the old version is less efficient in estimating the fold-changes when the differences are minor. Well, not estimating fold-changes but rather the dispersions. That would explain the differences I observed. And we only have 2 replicates so we cannot expect miracles from DEXSeq. Best regards, Ant?nio On 16 August 2014 12:24, Wolfgang Huber <whuber at="" embl.de=""> wrote: > Dear Antonio > > can you send one ore more specific examples, i.e. > - the count table for the affected gene(s), for all its exons, and/or the > plotDEXSeq output > - the size factorss > > This should help all of us understand better, and perhaps fix, what you?re > unhappy about. > What DEXSeq does is not a black box, it is in fact very simple, so we > should be able to get to the bottom of this. > > Regarding the question in the second paragraph: if you have reason to > assume that the biological variability is the same in all your conditions > (knockdowns), then the joint dispersion estimation will be more precise. > But it is not biologically implausible that the assumption may be wrong > (e.g. because of the different efficiency of RNAi), leading to > underestimating of the true biological variability (and there over- calling > of results) in some conditions. > > That also leads to the second point - with only two replicates per > condition, expectations about reproducibility of the result should be > modest. No amount of statistical software can undo that. > > Best wishes > Wolfgang > > -- -- Ant?nio Miguel de Jesus Domingues, PhD Postdoctoral researcher Deep Sequencing Group - SFB655 Biotechnology Center (Biotec) Technische Universit?t Dresden Fetscherstra?e 105 01307 Dresden Phone: +49 (351) 458 82362 Email: antonio.domingues(at)biotec.tu-dresden.de -- The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]
Dear Antonio, Thanks a lot for your explanations and sending your objects and code. I had a look at your data, apparently the difference in dispersion estimates between the old and the new versions of DEXSeq can make a difference in the coefficients of the GLM, therefore the exon fold changes. But this changes seem to be specifically affecting only those exons with very low counts. For example, with the objects that you send me: select <- rowSums( dxr$countData ) > 10 plot( dxr_new$log2fold_3_c_GFP_c[select], dxr_old$log2fold.3_c_c.GFP_c_c.[select] ) These numbers/plots give a much more reasonable picture. These differences are from those exons where noise is predominant. I will dig more into this, but I would not worry so much about it, the signs for the significant exons are anyway consistent: select2 <- which(dxr_old$padjust < 0.1) table( dxr_new$log2fold_3_c_GFP_c[select2] > 0 , dxr_old$log2fold.3_c_c.GFP_c_c.[select2] > 0) FALSE TRUE FALSE 1630 0 TRUE 0 614 Best regards, Alejandro > Dear Wolfgang and Alejandro, > > First of all, thank you for looking into this. > > can you send one ore more specific examples, i.e. > - the count table for the affected gene(s), for all its exons, > and/or the plotDEXSeq output > - the size factorss > > > I have prepared a data set+script for testing that will follow in a > separate private email, so that you can look into this in detail. > While preparing it I think I spotted where the difference in results > might originate *(1)*. > > Let me clarify that my concern is not with a particular exon, but > rather with the general trend (ratio of up-regulated / down- regulated > exons) that is changed, particularly in the experimental set-up I am > sending you. > > That also leads to the second point - with only two replicates per > condition, expectations about reproducibility of the result should > be modest. No amount of statistical software can undo that. > > > I am well aware of that :) In defence of data, I should say that the > experimental validation of the DGE results (for this same data) was > nearly 100%. So yes, few replicates can be an issue, but we have some > experimental validation to give us assurance that not all is bad. > > @ Alejandro > > Just an additional question, do you see the shift in fold changes > for all your exons or only for a subset of them? > In older versions there was a bug that was causing some label > swaps in the result columns, but this should be fixed in the most > recent versions (I just want to make sure it is fixed!). As > Wolfgang mentions, this would become evident by looking at the > plotDEXSeq output (by looking at the normalized counts and exon > usage). > > > > The scatter plot of fold change of new vs old version is a bit funky I > must say: > https://www.dropbox.com/s/l3snr4epgwbkty8/foldchange_comparison.png > > > *(1) * > while playing with the example data to send you, I noticed what could > be an explanation while counting significantly changed exons: > > https://www.dropbox.com/s/7zc4n352ftjzqqe/nHits_comparison.pdf > > In the old version of DEXseq without a fold-change cut-off, there are > more exons with decreased inclusion than with increased inclusion > (~2500/1500 exons). With increasingly higher fold-change cut-offs this > is inverted. For instance with fc 10% is 2000/1500, and with fc of > 50% is 80/400. So a completely different trend. Using the new DEXSeq > version, changing the FC cut-off makes no difference: the trend is > always more exons with increased inclusion, which is sort of what I > would expect. > > Could it be that the old version is less efficient in estimating the > fold-changes when the differences are minor. Well, not estimating > fold-changes but rather the dispersions. That would explain the > differences I observed. And we only have 2 replicates so we cannot > expect miracles from DEXSeq. > > Best regards, > Ant?nio > > > On 16 August 2014 12:24, Wolfgang Huber <whuber at="" embl.de=""> <mailto:whuber at="" embl.de="">> wrote: > > Dear Antonio > > can you send one ore more specific examples, i.e. > - the count table for the affected gene(s), for all its exons, > and/or the plotDEXSeq output > - the size factorss > > This should help all of us understand better, and perhaps fix, > what you?re unhappy about. > What DEXSeq does is not a black box, it is in fact very simple, so > we should be able to get to the bottom of this. > > Regarding the question in the second paragraph: if you have reason > to assume that the biological variability is the same in all your > conditions (knockdowns), then the joint dispersion estimation will > be more precise. But it is not biologically implausible that the > assumption may be wrong (e.g. because of the different efficiency > of RNAi), leading to underestimating of the true biological > variability (and there over-calling of results) in some conditions. > > That also leads to the second point - with only two replicates per > condition, expectations about reproducibility of the result should > be modest. No amount of statistical software can undo that. > > Best wishes > Wolfgang > > > > -- > -- > Ant?nio Miguel de Jesus Domingues, PhD > Postdoctoral researcher > Deep Sequencing Group - SFB655 > Biotechnology Center (Biotec) > Technische Universit?t Dresden > Fetscherstra?e 105 > 01307 Dresden > > Phone:+49 (351) 458 82362 <tel:%2b49%20%28351%29%20458%2082362> > Email: antonio.domingues(at)biotec.tu-dresden.de <http: biotec.tu-="" dresden.de=""> > -- > The Unbearable Lightness of Molecular Biology
0
4.4 years ago by
Germany
Hi Alejandro, thanks again for looking into this. > I had a look at your data, apparently the difference in dispersion > estimates between the old and the new versions of DEXSeq can make a > difference in the coefficients of the GLM, therefore the exon fold > changes. But this changes seem to be specifically affecting only those > exons with very low counts. This is very re-assuring and makes sense. The new version is teh way to go then :) Best regards, Ant?nio > For example, with the objects that you send me: > > select <- rowSums( dxr$countData ) > 10 > plot( dxr_new$log2fold_3_c_GFP_c[select], dxr_old$log2fold.3_c_c.GFP_c_c.[select] > ) > > These numbers/plots give a much more reasonable picture. These differences > are from those exons where noise is predominant. I will dig more into this, > but I would not worry so much about it, the signs for the significant exons > are anyway consistent: > > select2 <- which(dxr_old$padjust < 0.1) > table( dxr_new$log2fold_3_c_GFP_c[select2] > 0 , > dxr_old$log2fold.3_c_c.GFP_c_c.[select2] > 0) > > FALSE TRUE > FALSE 1630 0 > TRUE 0 614 > > Best regards, > Alejandro > > > > > Dear Wolfgang and Alejandro, >> >> First of all, thank you for looking into this. >> >> can you send one ore more specific examples, i.e. >> - the count table for the affected gene(s), for all its exons, >> and/or the plotDEXSeq output >> - the size factorss >> >> >> I have prepared a data set+script for testing that will follow in a >> separate private email, so that you can look into this in detail. While >> preparing it I think I spotted where the difference in results might >> originate *(1)*. >> >> >> Let me clarify that my concern is not with a particular exon, but rather >> with the general trend (ratio of up-regulated / down-regulated exons) that >> is changed, particularly in the experimental set-up I am sending you. >> >> That also leads to the second point - with only two replicates per >> condition, expectations about reproducibility of the result should >> be modest. No amount of statistical software can undo that. >> >> >> I am well aware of that :) In defence of data, I should say that the >> experimental validation of the DGE results (for this same data) was nearly >> 100%. So yes, few replicates can be an issue, but we have some experimental >> validation to give us assurance that not all is bad. >> >> @ Alejandro >> >> Just an additional question, do you see the shift in fold changes >> for all your exons or only for a subset of them? >> In older versions there was a bug that was causing some label >> swaps in the result columns, but this should be fixed in the most >> recent versions (I just want to make sure it is fixed!). As >> Wolfgang mentions, this would become evident by looking at the >> plotDEXSeq output (by looking at the normalized counts and exon >> usage). >> >> >> >> The scatter plot of fold change of new vs old version is a bit funky I >> must say: >> https://www.dropbox.com/s/l3snr4epgwbkty8/foldchange_comparison.png >> >> >> *(1) * >> >> while playing with the example data to send you, I noticed what could be >> an explanation while counting significantly changed exons: >> >> https://www.dropbox.com/s/7zc4n352ftjzqqe/nHits_comparison.pdf >> >> In the old version of DEXseq without a fold-change cut-off, there are >> more exons with decreased inclusion than with increased inclusion >> (~2500/1500 exons). With increasingly higher fold-change cut-offs this is >> inverted. For instance with fc 10% is 2000/1500, and with fc of 50% is >> 80/400. So a completely different trend. Using the new DEXSeq version, >> changing the FC cut-off makes no difference: the trend is always more exons >> with increased inclusion, which is sort of what I would expect. >> >> Could it be that the old version is less efficient in estimating the >> fold-changes when the differences are minor. Well, not estimating >> fold-changes but rather the dispersions. That would explain the differences >> I observed. And we only have 2 replicates so we cannot expect miracles from >> DEXSeq. >> >> Best regards, >> Ant?nio >> >> >> On 16 August 2014 12:24, Wolfgang Huber <whuber at="" embl.de="" <mailto:="">> whuber at embl.de>> wrote: >> >> Dear Antonio >> >> can you send one ore more specific examples, i.e. >> - the count table for the affected gene(s), for all its exons, >> and/or the plotDEXSeq output >> - the size factorss >> >> This should help all of us understand better, and perhaps fix, >> what you?re unhappy about. >> What DEXSeq does is not a black box, it is in fact very simple, so >> we should be able to get to the bottom of this. >> >> Regarding the question in the second paragraph: if you have reason >> to assume that the biological variability is the same in all your >> conditions (knockdowns), then the joint dispersion estimation will >> be more precise. But it is not biologically implausible that the >> assumption may be wrong (e.g. because of the different efficiency >> of RNAi), leading to underestimating of the true biological >> variability (and there over-calling of results) in some conditions. >> >> That also leads to the second point - with only two replicates per >> condition, expectations about reproducibility of the result should >> be modest. No amount of statistical software can undo that. >> >> Best wishes >> Wolfgang >> >> >> >> -- >> -- >> Ant?nio Miguel de Jesus Domingues, PhD >> Postdoctoral researcher >> Deep Sequencing Group - SFB655 >> Biotechnology Center (Biotec) >> Technische Universit?t Dresden >> Fetscherstra?e 105 >> 01307 Dresden >> >> Phone:+49 (351) 458 82362 <tel:%2b49%20%28351%29%20458%2082362> >> Email: antonio.domingues(at)biotec.tu-dresden.de < >> http://biotec.tu-dresden.de> >> >> -- >> The Unbearable Lightness of Molecular Biology >> > > -- -- Ant?nio Miguel de Jesus Domingues, PhD Postdoctoral researcher Deep Sequencing Group - SFB655 Biotechnology Center (Biotec) Technische Universit?t Dresden Fetscherstra?e 105 01307 Dresden Phone: +49 (351) 458 82362 Email: antonio.domingues(at)biotec.tu-dresden.de -- The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]