Question

Analyzing data without replicates using DiffBind

0

Entering edit mode

Hesh ▴ 10

@hesh-14437

Last seen 5.2 years ago

University of Washington

Hi,

I have a ChIP-seq dataset which contains two histone modifications (me1 and me2), both mapped in WT and mutant cells. I don't have replicates for any of those. Before I do more replicates, I want to test if there are any significant differential binding events between WT and mutant cells for both histone modifications.

Can I analyze this data using DiffBind even though I don't have replicates?

Thank you!

diffbind bioconductor differential binding analysis without replicate chipseq • 6.6k views

ADD COMMENT • link 8.2 years ago Hesh ▴ 10

3

Entering edit mode

Gord Brown ▴ 670

@gord-brown-5664

Last seen 5.0 years ago

United Kingdom

The second most-hated question among statistical and/or computational people (the first being do I need replicates?).

How many replicates you need depends on two parameters:

1) What effect size are you trying to detect, between the experimental groups? (For example, is a fold change of 10:1 all you care about, or do you want to detect a mere 2-fold change?)

2) How variable are your replicates within the groups?

The smaller the effect size (difference between groups) you want to detect, the more replicates you need. The larger the variability within your groups, the more replicates you need. The underlying principle is known as statistical power, and there is no one answer. With two replicates, you might find that some very dramatic changes appear statistically significant, but as Rory said, the statistics will be dubious at best. Three is common, but usually at best barely adequate. Lacking other information, we usually suggest 4, on the assumption that 1 will fail, and the remaining 3 may be barely adequate to give you the magic p<0.05 for some peaks.

I really strongly suggest you consult a statistician at your institution regarding experimental design. You're putting a lot of time and effort into the lab work... a consultation with your local statistics clinic is a very low-cost way of enhancing the likelihood of a good research result from all your effort.

Best of luck,

- Gord

ADD COMMENT • link 8.2 years ago Gord Brown ▴ 670

0

Entering edit mode

Hesh ▴ 10

@hesh-14437

Last seen 5.2 years ago

University of Washington

Hi Gord,

Yeah, I get it. Thank you for the quick response.

One more question: I fed the sample sheet with the four peaksets (two histone modifications in WT and NL cells). Then did:

me1and2_WTvsNL <- dba.contrast(me1and2_WTvsNL, categories=DBA_CONDITION, minMembers =2)

This gave a list of peaks that are significantly differentially bound between WT and NL cells. I'm confused how DiffBind did this without replicates. Is it because I've set the contrast to dba.condition and DiffBind identified differentially bound sites between and WT and NL without considering they could be from two different factors? So these sites can be differentially bound by me1 or me2 or both?

Thank you!

ADD COMMENT • link 8.2 years ago Hesh ▴ 10

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 12 months ago

Cambridge, UK

The contrast you have set up will check for intervals in which both histone marks change consistently (both increase or both decrease binding levels) between the WT and NT conditions. This leverages the statistics across all four of your samples, and can be a useful way to rank regions of potential interest. However with only two replicates in each sample group, the FDR values should be treated somewhat skeptically.

-Rory

ADD COMMENT • link 8.2 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Hesh ▴ 10

@hesh-14437

Last seen 5.2 years ago

University of Washington

Thank you for the explanation!

Ideally for a good analysis to get statistically significant output, would you use 3 replicates per sample or 2 would be sufficient?

ADD COMMENT • link 8.2 years ago Hesh ▴ 10

0

Entering edit mode

Hesh ▴ 10

@hesh-14437

Last seen 5.2 years ago

University of Washington

Thank you for the explanation! Appreciate a lot!

ADD COMMENT • link 8.2 years ago Hesh ▴ 10

score 2 · Accepted Answer · 2017-11-21

Hi,

Without replicates, you cannot hope to test meaningfully for statistically significant differences, with DiffBind or anything else.

The DiffBind vignette describes two approaches to analysis. You might look into an "occupancy" analysis, which allows you to, for example, draw Venn diagrams showing the differences between peak sets. Choose stringent criteria for your peaks, to cut down on noise, and see if there are peaks that show up in one sample but not the other. Then inspect the ones that are in one set and not the other visually via IGV or some other genome browser, to see if they look convincingly different. That's about the best you can do. Sorry I can't be more helpful, but the statisticians (and DiffBind) insist on replicates for a reason.

- Gord