The setup: ChIP-seq for histone marks in two closely-related cell lines which represent certain developmental stages.
The problem: We know from cytogenetics that one of the cell lines has genetic anomalies such as triplicated chromosome1 (trisomy) that will/would (probably) increase the counts originating from peaks at that chromosome by roughly 33%. Input samples (so total chromatin input) is available.
The question: What would be the current best practices using the
edgeR framework to correct for systematic differences in input abundances such as this one caused by trisomy-1? The most straight-forward approach would be to test interactions such as
(ChIP1 - Input 1) - (ChIP2 - Input2) but as the input library composition is strikingly different from the ChIP samples the underlying assumptions will probably be violated. So far I would normalize the data using the 10kb-bin strategy suggested in
csaw to account for the compositional changes. In this thread (https://support.bioconductor.org/p/82099/) it was suggested to ignore problematic regions but as here we are talking about an entire chromosome this is not an option.
Your opinions are appreciated.