Question

DiffBind or csaw between different transcription factors

0

Entering edit mode

GFM ▴ 20

@gfm-8326

Last seen 2.3 years ago

European Union

Hello, There are two different transcription factors that it is relevant to compare their binding locations. The two factors have a Myc tag, and a ChIP seq will be performed for each of the factors using an antibody against Myc. I was thinking that maybe each factor might have different frip, and it can cause biases in the comparisons. Is it OK to use DiffBind / csaw to compare these two ChIP seq samples (with replicates of course)? Thanks

DiffBind csaw • 1.6k views

ADD COMMENT • link updated 3.8 years ago by Aaron Lun ★ 28k • written 3.8 years ago by GFM ▴ 20

score 0 · Answer 1 · 2020-06-18

You can certainly use DiffBindor csaw to analyze this type of experiment.

You are correct to be concerned about normalization, particularly if there is a big difference in enrichment between the ChIPs for the two factors. If overall enrichment is similar bu widely reprogrammed (meaning about the same number of binding sites, but in much different places), you'll probably want to normalize based on overall sequencing depth of the libraries (not reads in peaks). If they binding in similar places, but on factor binds much less frequently, you want to avoid normalizing based on FRiP, as this will tend to minimize the biological facts.

If you do a DiffBind style analysis where peaks for replicates are merged into a consensus peakset, you can look at a) the overlaps between consensus peaksets for each of the two factors and b) the distribution of FRiP rates for each of the factors, in order to see if these conditions hold.

score 0 · Answer 2 · 2020-06-19

I went through several stages of grief when I thought about this problem.

Stage 1: Denial

Quantitatively comparing binding profiles for two different proteins is fraught with danger. There's just so many technical problems relating to differences in the efficiency of the cross-linking, DNA fragmentation, immunoprecipitation and so on. In your case, the worst of the problems are bypassed by using the same antibody for both proteins, but there's just so many other unknowns. Like... if one TF is binding in a larger complex that has a larger footprint and causes more of its fragments to be size-selected out during library preparation. Or if the larger complex is less amenable to crosslinking. Or if the conformation of one of the TF targets reduces the exposure of the Myc tag and the efficiency of IP. Or... well, ChIP is such a black box, who knows what's going on in there.

Stage 2: Anger

Let's assume that all the technical hurdles have been overcome. At this point, we can treat the coverage as an accurate relative measure of the concentration of bound protein for each TF. Say we go on to test for differences and find that one of our TFs binds to a particular genomic location at twice the concentration of the other TF. Then what? What biological insight can be derived from this piece of information? It's like dividing four apples by two oranges. For all we know, the lower-concentration TF might have a greater transcriptional effect per unit of concentration (e.g., because it has a stronger activator domain, or it triggers more long-lasting histone modifications), so even though the binding is lower, it's actually doing more.

Now, this problem of interpretation is not completely insurmountable. If your TFs are reasonably related then there is still some hope; one obvious use case lies in comparing two TFs where one is a modified version of the other (e.g., with a binding domain deleted or added). But in the general case of comparing two TFs, I think it would be pretty easy to poke holes in any biological conclusion drawn from the differential analysis results.

Stage 3: Bargaining

If you're confident that none of the above problems apply, then you can proceed to the differential binding analysis. Rory's comments largely apply to csaw as well, see the user's guide for more details. And in fact, despite all the issues, a DB analysis is still probably better than a simple setdiff on the peak sets to identify differences between the binding profiles. Personally, if I was looking at this experiment with two unrelated TFs, I would use glmTreat and crank up the log-fold change threshold to something like 3 to effectively get presence/absence calls. Don't bother with the small stuff like log-fold changes of 0.5 to 2, you want an easy-to-interpret statement like "TF X is present at these promoters and TF Y is not".

I don't have anything more to say right now, so I'll leave the other stages for the comments.