DiffBind Analysis with more than 2 samples?
Entering edit mode
Last seen 6.1 years ago


I have 2 replicates each of 4 different samples, so 8 samples total (P300 chip-seq).

All are from the same cell type, but with different transcription factors knocked out, and also WT.

I want to get "modules" of peaks gained and/or lost in each sample (enhancers created or repressed by each TF).

How might I set this up in DiffBind? 


Right now I have made my samples object like this:

8 Samples, 30757 sites in matrix:
               ID Tissue Factor    Condition Treatment Replicate Caller Intervals FRiP
1            WT.1   Th2   P300           WT       5d         1 counts     30757 0.18
2            WT.2   Th2   P300           WT       5d         2 counts     30757 0.14
3      Gata3_cKO.1   Th2   P300      Gata3_KO       5d         1 counts     30757 0.19
4      Gata3_cKO.2   Th2   P300      Gata3_KO       5d         2 counts     30757 0.13
5       Stat6_KO.1   Th2   P300      Stat6_KO       5d         1 counts     30757 0.17
6       Stat6_KO.2   Th2   P300      Stat6_KO       5d         2 counts     30757 0.15
7 Gata3_Stat6_dKO.1   Th2   P300 Gata3_Stat6_KO       5d         1 counts     30757 0.14
8 Gata3_Stat6_dKO.2   Th2   P300 Gata3_Stat6_KO       5d         2 counts     30757 0.12


Many thanks for any help

diffbind chip-seq • 3.1k views
Entering edit mode
Rory Stark ★ 4.4k
Last seen 1 day ago
CRUK, Cambridge, UK

Hi Sarah-

First, a caveat: having only two replicates of each condition is probably not enough to get good results out of this type of analysis.

That said, the most straightforward design would be to calculate the impact of each knock-out separately, as compared to WT. You can add these three contrasts as follows:

> myDBA <- dba.contrast(myDBA, group1=myDBA$masks$Gata3_KO, 
                        name1="Gata3_KO", name2="WT")
> myDBA <- dba.contrast(myDBA, group1=myDBA$masks$Stat6_KO, 
                        name1="Stat6_KO", name2="WT")
> myDBA <- dba.contrast(myDBA, group1=myDBA$masks$Gata3_Stat6_KO, 
                        name1="Gata3_Stat6_KO", name2="WT")

If you are interested in comparing how binding changes between the double knock-out and each of the single knock-outs, you could add the following two contrasts as well:

> myDBA <- dba.contrast(myDBA, group1=myDBA$masks$Gata3_KO,
                        name1="Gata3_KO", name2="Gata3_Stat6_KO")
> myDBA <- dba.contrast(myDBA, group1=myDBA$masks$Stat6_KO,
                        name1="Stat6_KO", name2="Gata3_Stat6_KO")

Then run an analysis:

> myDBA <- dba.analyze(myDBA)

If you want to, say, see how the changed sites overlap between the three knockouts (compared to WT), you can get a Venn diagram as follows:

> dbSites <- dba.report(myDBA, bDB=TRUE)
> dba.plotVenn(dbSites, 1:3)


Entering edit mode

Dear Rory,

I am a beginner to bioinformatics and DiffBind. Firstly, thank you for this very useful tool and the handy guides! I have some additional questions regarding using DiffBind on more than 2 conditions.

Currently I have 4 different conditions (representing different "subtypes" of a disease condition) and would like to identify differential peaks among all 4 different conditions (ideally identify peaks that are specific to each subtype).

I have set up all possible contrasts (i.e. 6 different contrasts). However from what I know, DiffBind only allows identification of differential regions between 2 conditions. As a result, for heatmap visualization of differential peak regions , at any one point I can only visualize one contrast that clusters the 2 conditions in that contrast. 

I am not sure if it is possible to use DiffBind to obtain a heatmap plot that can ideally cluster the 4 different conditions according to differential peak regions across all 4 conditions?

Please correct me if I have any misunderstanding. 

Thank you!

Best regards,

Entering edit mode

This is a new question, not a comment on the original quesiton, and should be logged as such.

Given a set of DB sites from a contrast, you can see how all the samples cluster, whether or not they are in the contrast, by setting the mask parameter to eg dba.plotHeatmap() and dba.plotPCA(). So if you have 4 conditions and 3 replicates of each, you can set mask=1:12 and all the samples will be plotted using the DB sites from the specified contrast.



Login before adding your answer.

Traffic: 200 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6