Normalization - DESeq2 for Differential Binding using ChIP-seq data
2
0
Entering edit mode
p2016k • 0
@p2016k-10767
Last seen 5.3 years ago

Hi All,

I have 2 control and 2 treated replicates. I am running DESeq2 in order to identify differential bound regions form ChIP-seq samples. However, their are discrepancies in the enrichment of the samples (ChIP-seq), which is messing with the results. Is their a way to ask DEseq2 to normalize the data and correct for the enrichment bias (maybe to correct for batch effect ) ?

Many thanks,

deseq2 chip-seq Differential binding • 1.2k views
0
Entering edit mode
@mikelove
Last seen 17 hours ago
United States

I don't have any recommendations here for ChIP-seq. It would help if you posted more information about these samples, are you saying you have 4 ChIP samples? Or 2 ChIP and 2 input?

As far as batch effects, these can only be identified if there are differences across batches in your data. Were the 4 samples processed in different batches? Were these balanced or confounded with treatment?

0
Entering edit mode
p2016k • 0
@p2016k-10767
Last seen 5.3 years ago

Dear Mike,

Sorry for the confusion.

- I have 2 ChIP samples with DMSO (no treatment) and  2 ChIP samples with drug treatment. No Input here. This gives 4 ChIP samples indeed.

- 1 DMSO + 1 treated samples were processed together and the other DMSO and treated sample were processed together too. So yes, we have 2 batches.

I want to see the drug effect on the binding. So I am using DMSO as a reference here. 1 DMSO ChIP sample without treatment is of lesser quality than the other DMSO sample, bringing its enrichment down and slightly closer to the treated samples, thus yielding less reads on called peaks. Can I correct for this bias, other than using the default parameters with DESeq2 (i.e the fitting parameter ...).

Many thanks again,

0
Entering edit mode

First, I'd make a PCA plot (see DESeq2 vignette) to help visualize the differences between samples.

And then you can use a design of ~batch + condition, which will control for differences across batches and let you perform cleaner inference on condition. You'll want to make an MA plot to see if the results look reasonable (is there a bulk of non-differentially bound regions with log2 fold change son the x-axis that allow you to come up with reasonable size factors). Also I would use DESeq() with betaPrior=FALSE first, as it's not clear to me with ChIP-seq that you will necessarily have a large set of non-differentially bound regions.

How are you picking the regions?

0
Entering edit mode

The ~batch + condition design made a small difference, that is some differential binding is observed when this option is specified.

I always make PCA and MA plots. PCA does show, on PC1, that one DMSO replicate is close to both treated samples.

In our case, we do expect many regions to loose binding upon treatment. So I guess that I should keep betaPrior=TRUE. Does it make sense ?

Regarding the regions that I am picking, I considered two sets in separate runs:

1) Since our factor of interest is also on TSS, I picked TSS -/+ 1 Kb (please comment if you think that this is not a wise decision)

2) Since I am using DMSO as reference, I picked DMSO peaks to do differential binding on (I also tried with merged peaks of DMSO and treated samples but I did not find this approach clean, since we will be dealing with "pseudo peaks").

I will be happy to share with a sample run in a private email, if you like.

0
Entering edit mode

I would only use betaPrior=TRUE on ChIP-seq data if there was a bulk of regions falling around to the x-axis when you do:

res <- results(dds, addMLE=TRUE)
plotMA(res, MLE=TRUE)

(this is the same MA plot you should get with running DESeq() with betaPrior=FALSE)

I believe you lose Type I error control if you use only one set of peaks to define the regions. See this paper from Aaron Lun and Gordon Smyth on how to select regions and maintain Type I error control:

http://nar.oxfordjournals.org/content/42/11/e95.full

0
Entering edit mode

Thank you again Mike, that was very helpful.

It looks in my case that betaPrior should not be used.

The paper was also helpful. However, since my signal is relatively diffuse (i.e like chromatin marks), using peaks is not optimal. I will use defined regions (TSS -/+ 1kb) to do my differential binding on. Sure, intergenic regions will be skipped in that case.

Best