Question: two factor chip with diffbind and edgeR (or DESeq2).
gravatar for james.dalgleish
21 months ago by
james.dalgleish30 wrote:

Is it somehow possible to do a chipseq experiment with two factors (treatment, no treatment), (antibody of interest, control Ig antibody)? I believe that one could do this with edgeR utilizing count data from chipseq and creating a DGEList with the group representing each individual chipseq run, and then creating a design matrix to calculate norm factors, estimate dispersion, and perform glmQLFtests (following p.8 of the edgeR manual)?

edger diffbind • 518 views
ADD COMMENTlink modified 21 months ago by Aaron Lun25k • written 21 months ago by james.dalgleish30
Answer: two factor chip with diffbind and edgeR (or DESeq2).
gravatar for Aaron Lun
21 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

What do you mean by each individual ChIP-seq run? The ChIP sample and its negative control? If so, the answer is technically yes. You can use the "run" as the blocking factor and have another set of terms for the treatment-specific log-fold change of the ChIP over the control. This can be fed through the GLM machinery in edgeR, e.g., to compare the log-fold changes between treatment conditions to identify DB regions.

In practice, this may not do what you expect. The argument for using the log-fold change in the differential comparisons is that chromatin states change across conditions, altering accessibility and thus background coverage. The aim is to compute a condition-specific log-fold change to "cancel out" the change in background coverage that isn't that interesting to us. Unfortunately, changes in chromatin state are often correlated with actual changes in binding, i.e., you get more protein binding at a location because the chromatin opens up. This means that any adjustment for chromatin state would also cancel out some or all of the changes in binding.

For example, let's say that my DNA was twice as open in my treatment condition compared to non-treatment at a particular genomic site. As a result of the increased accessibility, I also have twice as much binding of my protein of interest at this site. The two effects would cancel out when I computed my log-fold change for this condition, rendering me unable to detect differential binding between conditions. This is an inevitable result of "subtracting" the input effect in a log-link model, see A: csaw with negative controls for more details.

ADD COMMENTlink modified 21 months ago • written 21 months ago by Aaron Lun25k

Thanks for the response. Doesn't edgeR expect count data? Wouldn't it be better to provide counts using bedtools subtract using an input control, feed that into a counts matrix, then perform standard edgeR analysis?  Perhaps FC is a valid way to go about it. I'm open to that idea, but it would seem that edgeR would expect count data.

ADD REPLYlink written 21 months ago by james.dalgleish30

There seem to be a series of misunderstandings here, so let me clarify.

  1. Yes, edgeR does expect count data. But you can still compare log-fold changes between conditions if you set up your GLM correctly. It's equivalent to looking for a significant interaction term if one of your factors is the ChIP/control and the other factor is your treatment, and you set up an interaction model in your design matrix.
  2. Having said that, I am not recommending this approach, see my answer above.
  3. Subtracting counts is a bad idea if you intend to use edgeR on the resulting values for differential testing. The same is true for DESeq, see A: DESeq2 for ChIP-seq differential peaks for Mike's take on this.
ADD REPLYlink modified 21 months ago • written 21 months ago by Aaron Lun25k

Essentially, you recommend dropping input controls entirely then?

ADD REPLYlink written 21 months ago by james.dalgleish30

Yes, or using GreyListChIP if you are particularly concerned about changes in chromatin state. The idea is to simply remove problematic regions with high input coverage, rather than trying to be too clever about it and force the inputs into the differential analysis somehow.

ADD REPLYlink modified 21 months ago • written 21 months ago by Aaron Lun25k


One way to use input controls is via the GreyListChIP Bioconductor package.  It uses input controls to identify regions of the genome with high coverage in the input, which tend to confuse peak callers and produce a lot of spurious peaks.  You use the inputs to identify these regions, then remove reads aligning to these regions from analysis completely (prior to peak calling).  In one case, it eliminated ~1000 noise peaks, and changed the biological interpretation of the result.

Disclaimer: I'm the author of GreyListChIP, so I might be a tiny bit biased... :)

(Edited to add: yeah, what Aaron said... ;) )

ADD REPLYlink modified 21 months ago • written 21 months ago by Gord Brown590
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 470 users visited in the last hour