Differential ChIP-seq analysis
2
0
Entering edit mode
bioinfo • 0
@bioinfo-12782
Last seen 3 months ago
United States

I would like to compare two conditions with each two ChIP-seq (H3K27me3) samples. I searched for tools for differential binding and found that many including diffbind and csaw compares two conditions with read counts from pull down library only, not using input/control library/read count. The csaw has a step to use input/control but it is essentially for filtering regions not using it for statistical test for differential binding. I don't understand why these tools do not consider input/control read count. If input/control is different between two conditions, this should be considered for differential binding. Could you explain the rationale behind the algorithms or idea about this ?

diffbind csaw • 779 views
2
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 1 hour ago
The city by the bay

Also see the discussion at A: DESeq2 for ChIP-seq differential peaks.

I also have some more extended comments at:

https://github.com/LTLA/ChIPSeqThoughts/blob/master/subtract_control/subtract_control.Rmd

In short, there are statistical issues that are not easily resolved when dealing with negative control samples. And that's not even considering the gross technical differences between ChIP libraries and, say, input controls. For example, we often see that the latter clearly has a different distribution of fragment lengths on a Tapestation, and we also see consistent increases in the coverage of certain regions in the input; which raises the question of whether input controls are suitable negative controls at all, never mind whether they can be used in a DB analysis.

0
Entering edit mode

Thanks for your answer. It is much clearer now to me. By the way, in the website you mentioned, I cannot read math notation in the wrapping up remarks probably due to browser formatting. Can you check it?

0
Entering edit mode

The mathematical notation in the report doesn't render properly because it's not recognised by Github. The solution is easy - just clone the repository and compile the report with rmarkdown::render("subtract_control.Rmd"). Then you get to see the results of the simulations as well.

0
Entering edit mode
Rory Stark ★ 4.5k
@rory-stark-5741
Last seen 4 days ago
CRUK, Cambridge, UK

Input reads can also be used to identify to problematic regions to be filtered from further analysis, ie blacklisting. You should already be using the derived blacklists as per the ENCODE guidelines to filter reads. You can also use the GreyListChIP package to identify anomalous enrichment in your Input samples and filter these regions out as well, prior to the differential analysis. If you are using a peak caller at any stage, the blacklisting should occur before peak calling. Note that most peak callers use the Input samples as well to identify enriched intervals.

0
Entering edit mode