DiffBind with ChIPseq from histone analysis: EdgeR or Deseq2?
1
0
Entering edit mode
Giulia • 0
@06fc245c
Last seen 10 weeks ago
United Kingdom

Hello,

I am have ChIPseq data from histone marks from 2 different condition (mock and treated with 2 biological replicas each) and the aim of my analysis is to study whether a particular histone mark is enriched in one condition versus the other. I am new to bioinformatic and I don't have any statistical training and I am trying to teach myself how to normalise the data and perform the differential peak analysis. From my understanding, without prior knowledge and/or assumptions on biases/expected results, the best way to normalise my data is using background=TRUE and normalize=DBA_NORM_NATIVE. Established that, I am having trouble understanding with type of analysis method is best to use in my case. After normalisation, I set the contrast:

WTtest_model <- dba.contrast(WTtest_norm, reorderMeta=list(Condition="WT_mock"), minMembers = 2)

WTtest_model

Samples, 36975 sites in matrix:

     ID  Factor  Condition Replicate    Reads FRiP
Sample6 H3K4me3   WT_mock         1 14524870 0.50
Sample11 H3K4me3    WT_mock         2 15263769 0.44
Sample17 H3K4me3 WT_treated         1 14206596 0.40
Sample22 H3K4me3 WT_treated         2 15396209 0.40

I then do the differential peak analysis:

WTtest_res <- dba.analyze(WTtest_model, method=DBA_ALL_METHODS)

dba.show(WTtest_res,bContrasts=TRUE)

 Factor      Group Samples  Group2 Samples2 DB.edgeR DB.DESeq2
1 Condition WT_treated       2 WT_mock        2     9690       144

The results shows a big difference between the amount of differential bound regions identified with edgeR or DEseq2(with 98% less peaks identified by DEseq2 in respect to edgeR). What is the reason behind such difference? Which analysis method would best suit my type of study/data and why?

Second question, in the example I have used the data from H3K4me3 ChIPseq experiment. With other histone mark generating broader peaks (like H3K9me2 or H3K27ac) should I change anything in the script?

Any help is greatly appreciated as I cannot really get my head around this.

DESeq2 ChIPSeq DiffBind edgeR HistoneModification • 1.5k views
ADD COMMENT
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 11 days ago
Cambridge, UK

It is unusual to see such a big difference between the two analysis methods, although it is more likely when you have only two replicates due to how the estimation calculations are performed.

I would do a couple of things here. First I would look at the MA plots to see if there are any big differences dur to normalization:

par(mfrow=c(2,1))
dba.plotMA(WTtest_res, method=DBA_EDGER,  sub="edgeR")
dba.plotMA(WTtest_res, method=DBA_DESEQ2, sub="DESeq2")

I would also try the default normalization (using only library sizes) to see how similar the results are to these and to each other.

Regarding wider marks, you can play with the value of the summits parameter when calling dba.count(). The default value of summits=200 results in 401bp peaks, which is optimized for more ChIP-seq protocols that result in fairly large fragment sizes (compared with the sizes of the actual binding sites). However I usually use this setting for more widely-distributed marks as well. it is not essential to capture the entire enriched region when doing a differential analysis. Using a narrower region that is more likely to be mostly enriched (less background) can more reliably identify differential enrichment between regions, depending on the biological question you are addressing. For example, if you are using H3K27ac an an active enhancer mark, capturing a subset of the enhancer as direct you to where enhancer status is changing. In cases where the precise boundaries are important, you can include a downstream step to adjust the re-centred peak based on the original peak calls.

ADD COMMENT
0
Entering edit mode

Hi Rory,

Thank you for your quick reply and advice on the matter. The following are the MAplots for the analysis using the background=TRUE and normalize=DBA_NORM_NATIVE parameters. enter image description here

While using the full library normalisation I get this result:

dba.show(WTtest_res2,bContrasts=TRUE)

 Factor      Group Samples  Group2 Samples2 DB.edgeR DB.DESeq2
Condition WT_treated       2 WT_mock        2     6901       142

And the following MAplots are:

enter image description here

For edgeR the distribution of the peaks in the plot do not seem to change much even though the difference in differential peaks called in bigger that for Deseq2. In the case of DEseq2, I can see that the distribution of the peaks in the plots is a bit more different, especially looking at the peaks in the left "tail". When I compare DEseq2 with EdgeR independently of the normalisation, there is quite a difference. But I would not know how to interpret this result or decide on which method is the best. What is your advice? Thanks a lot for you for your help again.

ADD REPLY

Login before adding your answer.

Traffic: 705 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6