I am analysing the output of ChIP-seq for two transcription factors and one histone modification. We have 3 timepoints (e.g. A, B and C) and 4 samples per timepoint. I have been analysing the output of the DiffBind which if I understand correctly produces a consensus peak set derived from the MACS2 peak calling (peaks that are significant over corresponding input sample). The peaks are brought into the consensus set if they are in 2 or more samples. I have two questions regarding this consensus set I am hoping you can help me with.
I have 2 DiffBind outputs for each factor looking at timepoint B vs A and C vs A. I have noticed occurrences that a certain peak that is significantly differentially bound in say C vs A does not appear in the MACS2 output of any of the 4 samples at timepoint C i.e it has not been peak called (not significant over input). In this case I would assume this peak has been brought through to the DiffBind consensus set because it is in 2 or more samples of timepoint A or B. My question is is it okay that this peak is not significant over input at that timepoint but is significant in the DiffBind? I know DiffBind and MACS2 are different methodologies. How does DiffBind take into account the input?
Due to the consensus set I have the exact same number and location of peaks in the DiffBind output for B vs A and C vs A. I understand why this is the case. If I want to look at the differences in the peaks that are differentially bound in BvsA compared to CvsA obviously I can use a FDR cut off and the numbers and locations of peaks then become different and I can assess similarities and differences between the timepoints. However what about if I want to look at all peaks including the non-significant DiffBind peaks? For the histone modification I believe it could significantly change between timepoints (allowing TF binding) but also there can be other locations where it could be enriched (peak is significant over input) but the levels does not change with timepoint, i.e. the site is already primed ready for TF binding. In this context the consensus peak set being the same for both BvsA and CvsA, if taken as a whole is an issue. How can I assess the differences/similarities between BvsA and CvsA when they are exactly the same in terms of numbers and locations of peaks? Or should DiffBind only be used for looking at significantly differentially bound peaks and not peaks that don't change?
I hope that you can help and you can understand the points I am trying to make. I have not done the MACS2 and DiffBind myself, rather I am analysing their output so in terms of the methodology I am trying to fully understand it and from that know what to use for what biological question I have. Any help/advice you could give me would be greatly appreciated.
Thank you very much for your comments. That has definitely clarified my questions and bettered my understanding. From what you have said and what I have seen I won’t exclude those peaks that aren’t peak called in the samples at that condition. When I do look at these instances in a genome viewer, you can clearly see a difference in the peak between conditions which is confirmed by the DiffBind, but it hasn’t been peak called. In this way DiffBind is identifying significantly differentially bound peaks that would otherwise be missed if it was based on the fact the peak had to be peak called in all of the samples of that condition.
Again, thank you very much for your response.