Question

Differential ChIP-seq analysis

0

Entering edit mode

bioinfo • 0

@bioinfo-12782

Last seen 3.4 years ago

United States

I would like to compare two conditions with each two ChIP-seq (H3K27me3) samples. I searched for tools for differential binding and found that many including diffbind and csaw compares two conditions with read counts from pull down library only, not using input/control library/read count. The csaw has a step to use input/control but it is essentially for filtering regions not using it for statistical test for differential binding. I don't understand why these tools do not consider input/control read count. If input/control is different between two conditions, this should be considered for differential binding. Could you explain the rationale behind the algorithms or idea about this ?

diffbind csaw • 1.9k views

ADD COMMENT • link updated 7.5 years ago by Rory Stark ★ 5.2k • written 7.5 years ago by bioinfo • 0

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 12 months ago

Cambridge, UK

Input reads can also be used to identify to problematic regions to be filtered from further analysis, ie blacklisting. You should already be using the derived blacklists as per the ENCODE guidelines to filter reads. You can also use the GreyListChIP package to identify anomalous enrichment in your Input samples and filter these regions out as well, prior to the differential analysis. If you are using a peak caller at any stage, the blacklisting should occur before peak calling. Note that most peak callers use the Input samples as well to identify enriched intervals.

ADD COMMENT • link 7.5 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Thanks for your answer !

ADD REPLY • link 7.5 years ago bioinfo • 0

score 2 · Accepted Answer · 2018-07-06

2

Entering edit mode

Aaron Lun ★ 29k

@alun

Last seen 1 hour ago

The city by the bay

See A: csaw - workflow to incorporate input/control samples?.

Also see the discussion at A: DESeq2 for ChIP-seq differential peaks.

I also have some more extended comments at:

https://github.com/LTLA/ChIPSeqThoughts/blob/master/subtract_control/subtract_control.Rmd

In short, there are statistical issues that are not easily resolved when dealing with negative control samples. And that's not even considering the gross technical differences between ChIP libraries and, say, input controls. For example, we often see that the latter clearly has a different distribution of fragment lengths on a Tapestation, and we also see consistent increases in the coverage of certain regions in the input; which raises the question of whether input controls are suitable negative controls at all, never mind whether they can be used in a DB analysis.

ADD COMMENT • link 7.5 years ago Aaron Lun ★ 29k

0

Entering edit mode

Thanks for your answer. It is much clearer now to me. By the way, in the website you mentioned, I cannot read math notation in the wrapping up remarks probably due to browser formatting. Can you check it?

ADD REPLY • link 7.5 years ago bioinfo • 0

0

Entering edit mode

The mathematical notation in the report doesn't render properly because it's not recognised by Github. The solution is easy - just clone the repository and compile the report with rmarkdown::render("subtract_control.Rmd"). Then you get to see the results of the simulations as well.

ADD REPLY • link 7.5 years ago Aaron Lun ★ 29k