Differential expression testing for groups with unequal variances/dispersions?
1
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia
Hi Ryan, edgeR can't. voom can, but you have to put it together partly yourself. Just fit voom to each timepoint separately, then cbind the voom output objects back together. Or else just proceed in edgeR as if the dispersions are equal across timepoints. This will be conservative but won't give false positive results. Best wishes Gordon > Date: Fri, 24 May 2013 12:10:09 -0700 > From: "Ryan C. Thompson" <rct at="" thompsonclan.org=""> > To: bioconductor <bioconductor at="" r-project.org=""> > Subject: [BioC] Differential expression testing for groups with > unequal variances/dispersions? > > Hi all, > > I am studying a ChIP-Seq dataset (looking at gene promoter regions in > human) where it appears that different experimental groups have widely > different dispersions/variances using edgeR/limma. I have 4 timepoints, > and if I use edgeR to compute the dispersion for each timepoint > separately, I get: > > 0 hours: 0.407 > 24 hours: 0.505 > 120 hours: 0.115 > 2 weeks: 0.0531 > > So the dispersion seems to range from 0.05 to 0.5. I am looking to test > for "differential modification" between these timepoints, as well as > between cell types at each timepoint, etc., and I was wondering if there > is any differential expression test (or dispersion estimation method?) > that can handle groups with different dispersions/variances. > > For reference, here is my experimenal design as an Excel spreadsheet: > https://www.dropbox.com/s/3vnk4mai3dh39yv/chipseq-samples.xlsx > > And here is the result of plotBCV on each group (look at the last 4 > pages for the time point groups): > https://www.dropbox.com/s/s4caq1p0h3e4zhm/groupdisps.pdf (Warning: big > PDF with lots of points which may bring your PDF reader to its knees.) > > -Ryan Thompson ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
edgeR edgeR • 1.4k views
ADD COMMENT
0
Entering edit mode
@ryan-c-thompson-5618
Last seen 8 months ago
Scripps Research, La Jolla, CA
Hi Gordon, Thanks for the tips. You say that edgeR should be conservative when the equal dispersion assumption is violated, but this is not my experience. (I probably wouldn't have asked here on the list unless I was worried about false positives.) What I've seen is that will all 4 groups included in a single analysis, the low-dispersion time points drag down to overall dispersion estimate, and this results in (apparently) anticonservative results when testing for differential modification between the two high-dispersion time points. Obviously, I don't have a gold standard to compare against to conclude that the test is anticonservative, but I can compare to the results to previous analyses that I did before the final low-dispersion time point had come off the sequencer, and as expected, including the low-dispersion timepoint inflated the significance of most P-values in all contrasts. So, to get around this, would you recommend testing between time points by first subsetting the DGEList to just the two time points being compared and then re-estimating the dispersions, then finally conducting the test? That way, each individual test would be "self-contained" and not affected by groups that are not being tested. I could imagine that under these conditions, edgeR might be conservative, as you say. -Ryan Thompson On Sat May 25 04:28:39 2013, Gordon K Smyth wrote: > Hi Ryan, > > edgeR can't. > > voom can, but you have to put it together partly yourself. Just fit > voom to each timepoint separately, then cbind the voom output objects > back together. > > Or else just proceed in edgeR as if the dispersions are equal across > timepoints. This will be conservative but won't give false positive > results. > > Best wishes > Gordon > >> Date: Fri, 24 May 2013 12:10:09 -0700 >> From: "Ryan C. Thompson" <rct at="" thompsonclan.org=""> >> To: bioconductor <bioconductor at="" r-project.org=""> >> Subject: [BioC] Differential expression testing for groups with >> unequal variances/dispersions? >> >> Hi all, >> >> I am studying a ChIP-Seq dataset (looking at gene promoter regions in >> human) where it appears that different experimental groups have widely >> different dispersions/variances using edgeR/limma. I have 4 timepoints, >> and if I use edgeR to compute the dispersion for each timepoint >> separately, I get: >> >> 0 hours: 0.407 >> 24 hours: 0.505 >> 120 hours: 0.115 >> 2 weeks: 0.0531 >> >> So the dispersion seems to range from 0.05 to 0.5. I am looking to test >> for "differential modification" between these timepoints, as well as >> between cell types at each timepoint, etc., and I was wondering if there >> is any differential expression test (or dispersion estimation method?) >> that can handle groups with different dispersions/variances. >> >> For reference, here is my experimenal design as an Excel spreadsheet: >> https://www.dropbox.com/s/3vnk4mai3dh39yv/chipseq-samples.xlsx >> >> And here is the result of plotBCV on each group (look at the last 4 >> pages for the time point groups): >> https://www.dropbox.com/s/s4caq1p0h3e4zhm/groupdisps.pdf (Warning: big >> PDF with lots of points which may bring your PDF reader to its knees.) >> >> -Ryan Thompson > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}
ADD COMMENT
0
Entering edit mode

Thanks for the tips. You say that edgeR should be conservative when the equal dispersion assumption is violated, but this is not my experience. I probably wouldn't have asked here on the list unless I was worried about false positives.) What I've seen is that will all 4 groups included in a single analysis, the low-dispersion time points drag down to overall dispersion estimate, and this results in  (apparently) anticonservative results when testing for differential modification between the two high-dispersion time points.

Yes, that could happen.

Obviously, I don't have a gold standard to compare against to conclude that the test is anticonservative, but I can compare to the results to previous analyses that I did before the final low-dispersion time point had come off the sequencer, and as expected, including the low-dispersion timepoint inflated the significance of most P-values in all contrasts.

So, to get around this, would you recommend testing between time points by first subsetting the DGEList to just the two time points being compared and then re-estimating the dispersions, then finally conducting the test? That way, each individual test would be "self-contained" and not affected by groups that are not being tested.

That could be a sensible way to go, but it's up to you.  I don't recommend this as something to do routinely.

Why are the earlier time points so variable?  Different protocol? Presumably it is a technical issue -- it seems unlikely that a biological response would be more variable at time zero than at later times, and the dispersions seem very high.  Can the high variability of the earlier samples be mitigated by filtering or by removing an outlier library?

If you are convinced that the difference in variability is real and not removable, and if the counts are generally not too small, then you could also try the voom option.  Voom could allow you to analyse all the libraries together and still take account of variability in each group. What you want to do is what voomaByGroup() does, but for ChIP-seq instead of microarrays.  That's only a suggestion -- I have not seriously tested voom() myself on ChIP-seq data.

I could imagine that under these conditions, edgeR might be conservative, as you say.

I would expect so.

Best wishes
Gordon

ADD REPLY

Login before adding your answer.

Traffic: 637 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6