Differential ChIP-seq analysis
2
0
Entering edit mode
bioinfo • 0
@bioinfo-12782
Last seen 2.3 years ago
United States

I would like to compare two conditions with each two ChIP-seq (H3K27me3) samples. I searched for tools for differential binding and found that many including diffbind and csaw compares two conditions with read counts from pull down library only, not using input/control library/read count. The csaw has a step to use input/control but it is essentially for filtering regions not using it for statistical test for differential binding. I don't understand why these tools do not consider input/control read count. If input/control is different between two conditions, this should be considered for differential binding. Could you explain the rationale behind the algorithms or idea about this ?

diffbind csaw • 1.4k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 13 hours ago
The city by the bay

See A: csaw - workflow to incorporate input/control samples?.

Also see the discussion at A: DESeq2 for ChIP-seq differential peaks.

I also have some more extended comments at:

https://github.com/LTLA/ChIPSeqThoughts/blob/master/subtract_control/subtract_control.Rmd

In short, there are statistical issues that are not easily resolved when dealing with negative control samples. And that's not even considering the gross technical differences between ChIP libraries and, say, input controls. For example, we often see that the latter clearly has a different distribution of fragment lengths on a Tapestation, and we also see consistent increases in the coverage of certain regions in the input; which raises the question of whether input controls are suitable negative controls at all, never mind whether they can be used in a DB analysis.

ADD COMMENT
0
Entering edit mode

Thanks for your answer. It is much clearer now to me. By the way, in the website you mentioned, I cannot read math notation in the wrapping up remarks probably due to browser formatting. Can you check it?

ADD REPLY
0
Entering edit mode

The mathematical notation in the report doesn't render properly because it's not recognised by Github. The solution is easy - just clone the repository and compile the report with rmarkdown::render("subtract_control.Rmd"). Then you get to see the results of the simulations as well.

ADD REPLY
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 4 weeks ago
Cambridge, UK

Input reads can also be used to identify to problematic regions to be filtered from further analysis, ie blacklisting. You should already be using the derived blacklists as per the ENCODE guidelines to filter reads. You can also use the GreyListChIP package to identify anomalous enrichment in your Input samples and filter these regions out as well, prior to the differential analysis. If you are using a peak caller at any stage, the blacklisting should occur before peak calling. Note that most peak callers use the Input samples as well to identify enriched intervals.

 

ADD COMMENT
0
Entering edit mode

Thanks for your answer !

ADD REPLY

Login before adding your answer.

Traffic: 837 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6