DiffBind consensus peak set
1
0
Entering edit mode
ep15587 • 0
@ep15587-23992
Last seen 2.4 years ago

I am analysing the output of ChIP-seq for two transcription factors and one histone modification. We have 3 timepoints (e.g. A, B and C) and 4 samples per timepoint. I have been analysing the output of the DiffBind which if I understand correctly produces a consensus peak set derived from the MACS2 peak calling (peaks that are significant over corresponding input sample). The peaks are brought into the consensus set if they are in 2 or more samples. I have two questions regarding this consensus set I am hoping you can help me with.

1. I have 2 DiffBind outputs for each factor looking at timepoint B vs A and C vs A. I have noticed occurrences that a certain peak that is significantly differentially bound in say C vs A does not appear in the MACS2 output of any of the 4 samples at timepoint C i.e it has not been peak called (not significant over input). In this case I would assume this peak has been brought through to the DiffBind consensus set because it is in 2 or more samples of timepoint A or B. My question is is it okay that this peak is not significant over input at that timepoint but is significant in the DiffBind? I know DiffBind and MACS2 are different methodologies. How does DiffBind take into account the input?

2. Due to the consensus set I have the exact same number and location of peaks in the DiffBind output for B vs A and C vs A. I understand why this is the case. If I want to look at the differences in the peaks that are differentially bound in BvsA compared to CvsA obviously I can use a FDR cut off and the numbers and locations of peaks then become different and I can assess similarities and differences between the timepoints. However what about if I want to look at all peaks including the non-significant DiffBind peaks? For the histone modification I believe it could significantly change between timepoints (allowing TF binding) but also there can be other locations where it could be enriched (peak is significant over input) but the levels does not change with timepoint, i.e. the site is already primed ready for TF binding. In this context the consensus peak set being the same for both BvsA and CvsA, if taken as a whole is an issue. How can I assess the differences/similarities between BvsA and CvsA when they are exactly the same in terms of numbers and locations of peaks? Or should DiffBind only be used for looking at significantly differentially bound peaks and not peaks that don't change?

I hope that you can help and you can understand the points I am trying to make. I have not done the MACS2 and DiffBind myself, rather I am analysing their output so in terms of the methodology I am trying to fully understand it and from that know what to use for what biological question I have. Any help/advice you could give me would be greatly appreciated.

DiffBind MACS2 • 733 views
0
Entering edit mode
Rory Stark ★ 4.6k
@rory-stark-5741
Last seen 10 days ago
CRUK, Cambridge, UK
1. Yes, it is quite possible that an interval not identified as a peak in any of the sample int he particular contrast can show up as differentially bound. Peak calling an a fairly imprecise and noisy step. The location may not have had enough reads to be called as a peak by MACS, but still have significantly different read concentrations between the two conditions. You may choose to exclude these form the final results if you wish, or examine them in a browser. Regarding input, by default, DiffBindsubtracts the overlapping control reads from the overlapping ChIP reads for each sample. This isn't a particularly principled approach, but it does dampen regions that have high control coverage. (The treatment of control reads is changing somewhat in the next version of DiffBind).

2. Technically DiffBind (really, the underlying analysis packages DESeq2and edgeR) compute confidence statistics indicating that a peak may be differentially bound, but do not assess confidence that they have not changed (only that there is insufficient evidence that they have changed). You can always look at the underlying statistics and read counts by raising the FDR threshold (th=1in reports and plots will include all sites). In your BvsA and CvsA example, the idea is that your a re looking at all the "candidate" sites and assessing their change in each of the two comparisons. If there are "primed" sites with the histone mark already present in A, it will not change between A and B or A and C, but it still may be interesting depending on the scientific question.

Another alternative is to split the analysis into separate questions for the different timepoints and use different consensus peak sets for each, but this is usually not the best way to go about it. The fact that there may be peaks present in all conditions, but that do not change their binding priofle between specific times (or even ever), should not interfere with the analysis.

0
Entering edit mode

Thank you very much for your comments. That has definitely clarified my questions and bettered my understanding. From what you have said and what I have seen I won’t exclude those peaks that aren’t peak called in the samples at that condition. When I do look at these instances in a genome viewer, you can clearly see a difference in the peak between conditions which is confirmed by the DiffBind, but it hasn’t been peak called. In this way DiffBind is identifying significantly differentially bound peaks that would otherwise be missed if it was based on the fact the peak had to be peak called in all of the samples of that condition.

Again, thank you very much for your response.