DiffBind or csaw between different transcription factors
2
0
Entering edit mode
GFM ▴ 20
@gfm-8326
Last seen 2.9 years ago
European Union

Hello, There are two different transcription factors that it is relevant to compare their binding locations. The two factors have a Myc tag, and a ChIP seq will be performed for each of the factors using an antibody against Myc. I was thinking that maybe each factor might have different frip, and it can cause biases in the comparisons. Is it OK to use DiffBind / csaw to compare these two ChIP seq samples (with replicates of course)? Thanks

DiffBind csaw • 2.0k views
ADD COMMENT
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 4 weeks ago
Cambridge, UK

You can certainly use DiffBindor csaw to analyze this type of experiment.

You are correct to be concerned about normalization, particularly if there is a big difference in enrichment between the ChIPs for the two factors. If overall enrichment is similar bu widely reprogrammed (meaning about the same number of binding sites, but in much different places), you'll probably want to normalize based on overall sequencing depth of the libraries (not reads in peaks). If they binding in similar places, but on factor binds much less frequently, you want to avoid normalizing based on FRiP, as this will tend to minimize the biological facts.

If you do a DiffBind style analysis where peaks for replicates are merged into a consensus peakset, you can look at a) the overlaps between consensus peaksets for each of the two factors and b) the distribution of FRiP rates for each of the factors, in order to see if these conditions hold.

ADD COMMENT
0
Entering edit mode

Thank you very much!!

ADD REPLY
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 14 hours ago
The city by the bay

I went through several stages of grief when I thought about this problem.

Stage 1: Denial

Quantitatively comparing binding profiles for two different proteins is fraught with danger. There's just so many technical problems relating to differences in the efficiency of the cross-linking, DNA fragmentation, immunoprecipitation and so on. In your case, the worst of the problems are bypassed by using the same antibody for both proteins, but there's just so many other unknowns. Like... if one TF is binding in a larger complex that has a larger footprint and causes more of its fragments to be size-selected out during library preparation. Or if the larger complex is less amenable to crosslinking. Or if the conformation of one of the TF targets reduces the exposure of the Myc tag and the efficiency of IP. Or... well, ChIP is such a black box, who knows what's going on in there.

Stage 2: Anger

Let's assume that all the technical hurdles have been overcome. At this point, we can treat the coverage as an accurate relative measure of the concentration of bound protein for each TF. Say we go on to test for differences and find that one of our TFs binds to a particular genomic location at twice the concentration of the other TF. Then what? What biological insight can be derived from this piece of information? It's like dividing four apples by two oranges. For all we know, the lower-concentration TF might have a greater transcriptional effect per unit of concentration (e.g., because it has a stronger activator domain, or it triggers more long-lasting histone modifications), so even though the binding is lower, it's actually doing more.

Now, this problem of interpretation is not completely insurmountable. If your TFs are reasonably related then there is still some hope; one obvious use case lies in comparing two TFs where one is a modified version of the other (e.g., with a binding domain deleted or added). But in the general case of comparing two TFs, I think it would be pretty easy to poke holes in any biological conclusion drawn from the differential analysis results.

Stage 3: Bargaining

If you're confident that none of the above problems apply, then you can proceed to the differential binding analysis. Rory's comments largely apply to csaw as well, see the user's guide for more details. And in fact, despite all the issues, a DB analysis is still probably better than a simple setdiff on the peak sets to identify differences between the binding profiles. Personally, if I was looking at this experiment with two unrelated TFs, I would use glmTreat and crank up the log-fold change threshold to something like 3 to effectively get presence/absence calls. Don't bother with the small stuff like log-fold changes of 0.5 to 2, you want an easy-to-interpret statement like "TF X is present at these promoters and TF Y is not".

I don't have anything more to say right now, so I'll leave the other stages for the comments.

ADD COMMENT
0
Entering edit mode

Thank you very much!!

ADD REPLY

Login before adding your answer.

Traffic: 887 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6