I read through the other posts about using RNA-seq packages to analyze Chip-seq data (https://support.bioconductor.org/p/72098/). It seems like the general consensus is to just ignore the input control samples or build a black-list and look at differential binding between IP samples.
If I do want to incorporate input control into my differential binding, is it valid to include that data as another factor in the design matrix and perform a difference of difference?
So for every library, I would have a "IP" factor with two levels (IP/Input) and also a "sample" factor with two levels for treatment and control.
I am not very well versed in R, would it be possible to make this kind of contrast? And would this type of contrast even be valid?
Thanks. I am trying to understand what you meant when you wrote:
Are you saying that for a given genomic region that has signal in the input control, that signal is composed of various factors (mappability, gc content, open chromatin...). One of the factors of input control signal could also be the protein of interest that we are IP-ing. To then try to determine a difference between the protein of interest and the input control (which includes signal from the protein of interest) would not be productive. You would essentially be equalizing out the sample vs input signal.
Yes, the risk is that the changes in the input signal (e.g., due to changes in chromatin accessibility) are biologically correlated with genuine changes in ChIP signal (e.g., due to more protein binding in regions of open chromatin). So if you use the former to "correct" the latter, you would weaken or lose the DB effect. Anecdotes suggest that this does indeed happen, which is why I whine about it.