Search
Question: Analyzing data without replicates using DiffBind
0
gravatar for Hesh
21 days ago by
Hesh0
University of Washington
Hesh0 wrote:

Hi, 

I have a ChIP-seq dataset which contains two histone modifications (me1 and me2), both mapped in WT and mutant cells. I don't have replicates for any of those. Before I do more replicates, I want to test if there are any significant differential binding events between WT and mutant cells for both histone modifications.

Can I analyze this data using DiffBind even though I don't have replicates? 

 

Thank you! 

ADD COMMENTlink modified 19 days ago • written 21 days ago by Hesh0
2
gravatar for Gord Brown
21 days ago by
Gord Brown560
United Kingdom
Gord Brown560 wrote:

Hi,

Without replicates, you cannot hope to test meaningfully for statistically significant differences, with DiffBind or anything else.

The DiffBind vignette describes two approaches to analysis.  You might look into an "occupancy" analysis, which allows you to, for example, draw Venn diagrams showing the differences between peak sets.  Choose stringent criteria for your peaks, to cut down on noise, and see if there are peaks that show up in one sample but not the other.  Then inspect the ones that are in one set and not the other visually via IGV or some other genome browser, to see if they look convincingly different.  That's about the best you can do.  Sorry I can't be more helpful, but the statisticians (and DiffBind) insist on replicates for a reason.

 - Gord

ADD COMMENTlink written 21 days ago by Gord Brown560
0
gravatar for Hesh
21 days ago by
Hesh0
University of Washington
Hesh0 wrote:

Hi Gord, 

Yeah, I get it. Thank you for the quick response. 

One more question: I fed the sample sheet with the four peaksets (two histone modifications in WT and NL cells). Then did:

me1and2_WTvsNL <- dba.contrast(me1and2_WTvsNL, categories=DBA_CONDITION, minMembers =2)

This gave a list of peaks that are significantly differentially bound between WT and NL cells. I'm confused how DiffBind did this without replicates. Is it because I've set the contrast to dba.condition and DiffBind identified differentially bound sites between and WT and NL without considering they could be from two different factors? So these sites can be differentially bound by me1 or me2 or both? 

Thank you!

 

ADD COMMENTlink modified 21 days ago • written 21 days ago by Hesh0
0
gravatar for Rory Stark
20 days ago by
Rory Stark2.2k
CRUK, Cambridge, UK
Rory Stark2.2k wrote:

The contrast you have set up will check for intervals in which both histone marks change consistently (both increase or both decrease binding levels) between the WT and NT conditions. This leverages the statistics across all four of your samples, and can be a useful way to rank regions of potential interest. However with only two replicates in each sample group, the FDR values should be treated somewhat skeptically.

-Rory

ADD COMMENTlink written 20 days ago by Rory Stark2.2k
0
gravatar for Hesh
20 days ago by
Hesh0
University of Washington
Hesh0 wrote:

Thank  you for the explanation! 

Ideally for a good analysis to get statistically significant output, would you use 3 replicates per sample or 2 would be sufficient? 

ADD COMMENTlink written 20 days ago by Hesh0
0
gravatar for Gord Brown
20 days ago by
Gord Brown560
United Kingdom
Gord Brown560 wrote:

The second most-hated question among statistical and/or computational people (the first being do I need replicates?).

How many replicates you need depends on two parameters:

1) What effect size are you trying to detect, between the experimental groups?  (For example, is a fold change of 10:1 all you care about, or do you want to detect a mere 2-fold change?)

2) How variable are your replicates within the groups?

The smaller the effect size (difference between groups) you want to detect, the more replicates you need.  The larger the variability within your groups, the more replicates you need.  The underlying principle is known as statistical power, and there is no one answer.  With two replicates, you might find that some very dramatic changes appear statistically significant, but as Rory said, the statistics will be dubious at best.  Three is common, but usually at best barely adequate.  Lacking other information, we usually suggest 4, on the assumption that 1 will fail, and the remaining 3 may be barely adequate to give you the magic p<0.05 for some peaks.

I really strongly suggest you consult a statistician at your institution regarding experimental design.  You're putting a lot of time and effort into the lab work... a consultation with your local statistics clinic is a very low-cost way of enhancing the likelihood of a good research result from all your effort.

Best of luck,

 - Gord

ADD COMMENTlink written 20 days ago by Gord Brown560
0
gravatar for Hesh
19 days ago by
Hesh0
University of Washington
Hesh0 wrote:

Thank you for the explanation! Appreciate a lot! 

ADD COMMENTlink written 19 days ago by Hesh0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 342 users visited in the last hour