Search
Question: DEXSeq update results change
0
4.3 years ago by
Germany
modified 4.3 years ago • written 4.3 years ago by António Miguel de Jesus Domingues390
0
4.3 years ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:
Dear Antonio, Thanks a lot for your explanations and sending your objects and code. I had a look at your data, apparently the difference in dispersion estimates between the old and the new versions of DEXSeq can make a difference in the coefficients of the GLM, therefore the exon fold changes. But this changes seem to be specifically affecting only those exons with very low counts. For example, with the objects that you send me: select <- rowSums( dxr$countData ) > 10 plot( dxr_new$log2fold_3_c_GFP_c[select], dxr_old$log2fold.3_c_c.GFP_c_c.[select] ) These numbers/plots give a much more reasonable picture. These differences are from those exons where noise is predominant. I will dig more into this, but I would not worry so much about it, the signs for the significant exons are anyway consistent: select2 <- which(dxr_old$padjust < 0.1) table( dxr_new$log2fold_3_c_GFP_c[select2] > 0 , dxr_old$log2fold.3_c_c.GFP_c_c.[select2] > 0) FALSE TRUE FALSE 1630 0 TRUE 0 614 Best regards, Alejandro > Dear Wolfgang and Alejandro, > > First of all, thank you for looking into this. > > can you send one ore more specific examples, i.e. > - the count table for the affected gene(s), for all its exons, > and/or the plotDEXSeq output > - the size factorss > > > I have prepared a data set+script for testing that will follow in a > separate private email, so that you can look into this in detail. > While preparing it I think I spotted where the difference in results > might originate *(1)*. > > Let me clarify that my concern is not with a particular exon, but > rather with the general trend (ratio of up-regulated / down- regulated > exons) that is changed, particularly in the experimental set-up I am > sending you. > > That also leads to the second point - with only two replicates per > condition, expectations about reproducibility of the result should > be modest. No amount of statistical software can undo that. > > > I am well aware of that :) In defence of data, I should say that the > experimental validation of the DGE results (for this same data) was > nearly 100%. So yes, few replicates can be an issue, but we have some > experimental validation to give us assurance that not all is bad. > > @ Alejandro > > Just an additional question, do you see the shift in fold changes > for all your exons or only for a subset of them? > In older versions there was a bug that was causing some label > swaps in the result columns, but this should be fixed in the most > recent versions (I just want to make sure it is fixed!). As > Wolfgang mentions, this would become evident by looking at the > plotDEXSeq output (by looking at the normalized counts and exon > usage). > > > > The scatter plot of fold change of new vs old version is a bit funky I > must say: > https://www.dropbox.com/s/l3snr4epgwbkty8/foldchange_comparison.png > > > *(1) * > while playing with the example data to send you, I noticed what could > be an explanation while counting significantly changed exons: > > https://www.dropbox.com/s/7zc4n352ftjzqqe/nHits_comparison.pdf > > In the old version of DEXseq without a fold-change cut-off, there are > more exons with decreased inclusion than with increased inclusion > (~2500/1500 exons). With increasingly higher fold-change cut-offs this > is inverted. For instance with fc 10% is 2000/1500, and with fc of > 50% is 80/400. So a completely different trend. Using the new DEXSeq > version, changing the FC cut-off makes no difference: the trend is > always more exons with increased inclusion, which is sort of what I > would expect. > > Could it be that the old version is less efficient in estimating the > fold-changes when the differences are minor. Well, not estimating > fold-changes but rather the dispersions. That would explain the > differences I observed. And we only have 2 replicates so we cannot > expect miracles from DEXSeq. > > Best regards, > Ant?nio > > > On 16 August 2014 12:24, Wolfgang Huber <whuber at="" embl.de=""> <mailto:whuber at="" embl.de="">> wrote: > > Dear Antonio > > can you send one ore more specific examples, i.e. > - the count table for the affected gene(s), for all its exons, > and/or the plotDEXSeq output > - the size factorss > > This should help all of us understand better, and perhaps fix, > what you?re unhappy about. > What DEXSeq does is not a black box, it is in fact very simple, so > we should be able to get to the bottom of this. > > Regarding the question in the second paragraph: if you have reason > to assume that the biological variability is the same in all your > conditions (knockdowns), then the joint dispersion estimation will > be more precise. But it is not biologically implausible that the > assumption may be wrong (e.g. because of the different efficiency > of RNAi), leading to underestimating of the true biological > variability (and there over-calling of results) in some conditions. > > That also leads to the second point - with only two replicates per > condition, expectations about reproducibility of the result should > be modest. No amount of statistical software can undo that. > > Best wishes > Wolfgang > > > > -- > -- > Ant?nio Miguel de Jesus Domingues, PhD > Postdoctoral researcher > Deep Sequencing Group - SFB655 > Biotechnology Center (Biotec) > Technische Universit?t Dresden > Fetscherstra?e 105 > 01307 Dresden > > Phone:+49 (351) 458 82362 <tel:%2b49%20%28351%29%20458%2082362> > Email: antonio.domingues(at)biotec.tu-dresden.de <http: biotec.tu-="" dresden.de=""> > -- > The Unbearable Lightness of Molecular Biology
0
4.3 years ago by
Germany
Hi Alejandro, thanks again for looking into this. > I had a look at your data, apparently the difference in dispersion > estimates between the old and the new versions of DEXSeq can make a > difference in the coefficients of the GLM, therefore the exon fold > changes. But this changes seem to be specifically affecting only those > exons with very low counts. This is very re-assuring and makes sense. The new version is teh way to go then :) Best regards, Ant?nio > For example, with the objects that you send me: > > select <- rowSums( dxr$countData ) > 10 > plot( dxr_new$log2fold_3_c_GFP_c[select], dxr_old$log2fold.3_c_c.GFP_c_c.[select] > ) > > These numbers/plots give a much more reasonable picture. These differences > are from those exons where noise is predominant. I will dig more into this, > but I would not worry so much about it, the signs for the significant exons > are anyway consistent: > > select2 <- which(dxr_old$padjust < 0.1) > table( dxr_new$log2fold_3_c_GFP_c[select2] > 0 , > dxr_old$log2fold.3_c_c.GFP_c_c.[select2] > 0) > > FALSE TRUE > FALSE 1630 0 > TRUE 0 614 > > Best regards, > Alejandro > > > > > Dear Wolfgang and Alejandro, >> >> First of all, thank you for looking into this. >> >> can you send one ore more specific examples, i.e. >> - the count table for the affected gene(s), for all its exons, >> and/or the plotDEXSeq output >> - the size factorss >> >> >> I have prepared a data set+script for testing that will follow in a >> separate private email, so that you can look into this in detail. While >> preparing it I think I spotted where the difference in results might >> originate *(1)*. >> >> >> Let me clarify that my concern is not with a particular exon, but rather >> with the general trend (ratio of up-regulated / down-regulated exons) that >> is changed, particularly in the experimental set-up I am sending you. >> >> That also leads to the second point - with only two replicates per >> condition, expectations about reproducibility of the result should >> be modest. No amount of statistical software can undo that. >> >> >> I am well aware of that :) In defence of data, I should say that the >> experimental validation of the DGE results (for this same data) was nearly >> 100%. So yes, few replicates can be an issue, but we have some experimental >> validation to give us assurance that not all is bad. >> >> @ Alejandro >> >> Just an additional question, do you see the shift in fold changes >> for all your exons or only for a subset of them? >> In older versions there was a bug that was causing some label >> swaps in the result columns, but this should be fixed in the most >> recent versions (I just want to make sure it is fixed!). As >> Wolfgang mentions, this would become evident by looking at the >> plotDEXSeq output (by looking at the normalized counts and exon >> usage). >> >> >> >> The scatter plot of fold change of new vs old version is a bit funky I >> must say: >> https://www.dropbox.com/s/l3snr4epgwbkty8/foldchange_comparison.png >> >> >> *(1) * >> >> while playing with the example data to send you, I noticed what could be >> an explanation while counting significantly changed exons: >> >> https://www.dropbox.com/s/7zc4n352ftjzqqe/nHits_comparison.pdf >> >> In the old version of DEXseq without a fold-change cut-off, there are >> more exons with decreased inclusion than with increased inclusion >> (~2500/1500 exons). With increasingly higher fold-change cut-offs this is >> inverted. For instance with fc 10% is 2000/1500, and with fc of 50% is >> 80/400. So a completely different trend. Using the new DEXSeq version, >> changing the FC cut-off makes no difference: the trend is always more exons >> with increased inclusion, which is sort of what I would expect. >> >> Could it be that the old version is less efficient in estimating the >> fold-changes when the differences are minor. Well, not estimating >> fold-changes but rather the dispersions. That would explain the differences >> I observed. And we only have 2 replicates so we cannot expect miracles from >> DEXSeq. >> >> Best regards, >> Ant?nio >> >> >> On 16 August 2014 12:24, Wolfgang Huber <whuber at="" embl.de="" <mailto:="">> whuber at embl.de>> wrote: >> >> Dear Antonio >> >> can you send one ore more specific examples, i.e. >> - the count table for the affected gene(s), for all its exons, >> and/or the plotDEXSeq output >> - the size factorss >> >> This should help all of us understand better, and perhaps fix, >> what you?re unhappy about. >> What DEXSeq does is not a black box, it is in fact very simple, so >> we should be able to get to the bottom of this. >> >> Regarding the question in the second paragraph: if you have reason >> to assume that the biological variability is the same in all your >> conditions (knockdowns), then the joint dispersion estimation will >> be more precise. But it is not biologically implausible that the >> assumption may be wrong (e.g. because of the different efficiency >> of RNAi), leading to underestimating of the true biological >> variability (and there over-calling of results) in some conditions. >> >> That also leads to the second point - with only two replicates per >> condition, expectations about reproducibility of the result should >> be modest. No amount of statistical software can undo that. >> >> Best wishes >> Wolfgang >> >> >> >> -- >> -- >> Ant?nio Miguel de Jesus Domingues, PhD >> Postdoctoral researcher >> Deep Sequencing Group - SFB655 >> Biotechnology Center (Biotec) >> Technische Universit?t Dresden >> Fetscherstra?e 105 >> 01307 Dresden >> >> Phone:+49 (351) 458 82362 <tel:%2b49%20%28351%29%20458%2082362> >> Email: antonio.domingues(at)biotec.tu-dresden.de < >> http://biotec.tu-dresden.de> >> >> -- >> The Unbearable Lightness of Molecular Biology >> > > -- -- Ant?nio Miguel de Jesus Domingues, PhD Postdoctoral researcher Deep Sequencing Group - SFB655 Biotechnology Center (Biotec) Technische Universit?t Dresden Fetscherstra?e 105 01307 Dresden Phone: +49 (351) 458 82362 Email: antonio.domingues(at)biotec.tu-dresden.de -- The Unbearable Lightness of Molecular Biology [[alternative HTML version deleted]]