Question

DiffBind with ChIPseq from histone analysis: EdgeR or Deseq2?

0

Entering edit mode

Giulia • 0

@06fc245c

Last seen 16 months ago

United Kingdom

Hello,

I am have ChIPseq data from histone marks from 2 different condition (mock and treated with 2 biological replicas each) and the aim of my analysis is to study whether a particular histone mark is enriched in one condition versus the other. I am new to bioinformatic and I don't have any statistical training and I am trying to teach myself how to normalise the data and perform the differential peak analysis. From my understanding, without prior knowledge and/or assumptions on biases/expected results, the best way to normalise my data is using background=TRUE and normalize=DBA_NORM_NATIVE. Established that, I am having trouble understanding with type of analysis method is best to use in my case. After normalisation, I set the contrast:

WTtest_model <- dba.contrast(WTtest_norm, reorderMeta=list(Condition="WT_mock"), minMembers = 2)

WTtest_model

Samples, 36975 sites in matrix:

     ID  Factor  Condition Replicate    Reads FRiP
Sample6 H3K4me3   WT_mock         1 14524870 0.50
Sample11 H3K4me3    WT_mock         2 15263769 0.44
Sample17 H3K4me3 WT_treated         1 14206596 0.40
Sample22 H3K4me3 WT_treated         2 15396209 0.40

I then do the differential peak analysis:

WTtest_res <- dba.analyze(WTtest_model, method=DBA_ALL_METHODS)

dba.show(WTtest_res,bContrasts=TRUE)

 Factor      Group Samples  Group2 Samples2 DB.edgeR DB.DESeq2
1 Condition WT_treated       2 WT_mock        2     9690       144

The results shows a big difference between the amount of differential bound regions identified with edgeR or DEseq2(with 98% less peaks identified by DEseq2 in respect to edgeR). What is the reason behind such difference? Which analysis method would best suit my type of study/data and why?

Second question, in the example I have used the data from H3K4me3 ChIPseq experiment. With other histone mark generating broader peaks (like H3K9me2 or H3K27ac) should I change anything in the script?

Any help is greatly appreciated as I cannot really get my head around this.

DESeq2 ChIPSeq DiffBind edgeR HistoneModification • 2.2k views

ADD COMMENT • link 2.8 years ago Giulia • 0

score 0 · Answer 1 · 2023-03-24

It is unusual to see such a big difference between the two analysis methods, although it is more likely when you have only two replicates due to how the estimation calculations are performed.

I would do a couple of things here. First I would look at the MA plots to see if there are any big differences dur to normalization:

par(mfrow=c(2,1))
dba.plotMA(WTtest_res, method=DBA_EDGER,  sub="edgeR")
dba.plotMA(WTtest_res, method=DBA_DESEQ2, sub="DESeq2")

I would also try the default normalization (using only library sizes) to see how similar the results are to these and to each other.

Regarding wider marks, you can play with the value of the summits parameter when calling dba.count(). The default value of summits=200 results in 401bp peaks, which is optimized for more ChIP-seq protocols that result in fairly large fragment sizes (compared with the sizes of the actual binding sites). However I usually use this setting for more widely-distributed marks as well. it is not essential to capture the entire enriched region when doing a differential analysis. Using a narrower region that is more likely to be mostly enriched (less background) can more reliably identify differential enrichment between regions, depending on the biological question you are addressing. For example, if you are using H3K27ac an an active enhancer mark, capturing a subset of the enhancer as direct you to where enhancer status is changing. In cases where the precise boundaries are important, you can include a downstream step to adjust the re-centred peak based on the original peak calls.