two factor chip with diffbind and edgeR (or DESeq2).
1
0
Entering edit mode
@jamesdalgleish-14561
Last seen 9 months ago
United States

Is it somehow possible to do a chipseq experiment with two factors (treatment, no treatment), (antibody of interest, control Ig antibody)? I believe that one could do this with edgeR utilizing count data from chipseq and creating a DGEList with the group representing each individual chipseq run, and then creating a design matrix to calculate norm factors, estimate dispersion, and perform glmQLFtests (following p.8 of the edgeR manual)?

diffbind edger • 1.6k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 7 hours ago
The city by the bay

What do you mean by each individual ChIP-seq run? The ChIP sample and its negative control? If so, the answer is technically yes. You can use the "run" as the blocking factor and have another set of terms for the treatment-specific log-fold change of the ChIP over the control. This can be fed through the GLM machinery in edgeR, e.g., to compare the log-fold changes between treatment conditions to identify DB regions.

In practice, this may not do what you expect. The argument for using the log-fold change in the differential comparisons is that chromatin states change across conditions, altering accessibility and thus background coverage. The aim is to compute a condition-specific log-fold change to "cancel out" the change in background coverage that isn't that interesting to us. Unfortunately, changes in chromatin state are often correlated with actual changes in binding, i.e., you get more protein binding at a location because the chromatin opens up. This means that any adjustment for chromatin state would also cancel out some or all of the changes in binding.

For example, let's say that my DNA was twice as open in my treatment condition compared to non-treatment at a particular genomic site. As a result of the increased accessibility, I also have twice as much binding of my protein of interest at this site. The two effects would cancel out when I computed my log-fold change for this condition, rendering me unable to detect differential binding between conditions. This is an inevitable result of "subtracting" the input effect in a log-link model, see A: csaw with negative controls for more details.

ADD COMMENT
0
Entering edit mode

Thanks for the response. Doesn't edgeR expect count data? Wouldn't it be better to provide counts using bedtools subtract using an input control, feed that into a counts matrix, then perform standard edgeR analysis?  Perhaps FC is a valid way to go about it. I'm open to that idea, but it would seem that edgeR would expect count data.

ADD REPLY
0
Entering edit mode

There seem to be a series of misunderstandings here, so let me clarify.

  1. Yes, edgeR does expect count data. But you can still compare log-fold changes between conditions if you set up your GLM correctly. It's equivalent to looking for a significant interaction term if one of your factors is the ChIP/control and the other factor is your treatment, and you set up an interaction model in your design matrix.
  2. Having said that, I am not recommending this approach, see my answer above.
  3. Subtracting counts is a bad idea if you intend to use edgeR on the resulting values for differential testing. The same is true for DESeq, see A: DESeq2 for ChIP-seq differential peaks for Mike's take on this.
ADD REPLY
0
Entering edit mode

Essentially, you recommend dropping input controls entirely then?

ADD REPLY
0
Entering edit mode

Yes, or using GreyListChIP if you are particularly concerned about changes in chromatin state. The idea is to simply remove problematic regions with high input coverage, rather than trying to be too clever about it and force the inputs into the differential analysis somehow.

ADD REPLY
0
Entering edit mode

Hi,

One way to use input controls is via the GreyListChIP Bioconductor package.  It uses input controls to identify regions of the genome with high coverage in the input, which tend to confuse peak callers and produce a lot of spurious peaks.  You use the inputs to identify these regions, then remove reads aligning to these regions from analysis completely (prior to peak calling).  In one case, it eliminated ~1000 noise peaks, and changed the biological interpretation of the result.

Disclaimer: I'm the author of GreyListChIP, so I might be a tiny bit biased... :)

(Edited to add: yeah, what Aaron said... ;) )

ADD REPLY

Login before adding your answer.

Traffic: 762 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6