DiffBind minOverlap parameter
1
0
Entering edit mode
Last seen 2.9 years ago

Hi, I'm trying to use DiffBind for my ChIPseq analysis, but I encountered the following "problem".

I loaded two patients' groups of samples using dba function and setting minOverlap equal to 2. Same setting also for:

my_peaks <- dba.count(my_peaks, bUseSummarizeOverlaps=F,minOverlap=2).

I defined the contrast and then I ran a differential analysis: my_peaks <- dba.analyze(my_peaks,method=DBA_DESEQ2)

But when I looked into this object: peaks_DF <- dba.report( my_peaks,bCalled=TRUE,th=0.1) I found some peaks present only in 1 one sample for the groupA, and only in one sample for the groupB.

Is there any possibility to avoid this and having only a minOverlap of 2 in each patients' group? I mean at least a minimum of overlap=2 in each considered category (>=2 in the groupA and >=2 in the group B, not >=2 groupA+groupB) to create a consensus on which perform differential analysis.

DiffBind minOverlap consensus peakset Job • 877 views
0
Entering edit mode
Rory Stark ★ 4.4k
@rory-stark-5741
Last seen 8 hours ago
CRUK, Cambridge, UK

You can make the consensus peakset in two steps: first making a consensus for each sample group with minOverlap=2, then taking the union of these with minOverlap=1.

Here's an example using the sample data, where we get the peaks that overlap in at least two of the Responsive samples and at least two of the Resistant samples, the count using all of these peaks:

# Load peak data
data(tamoxifen_peaks)

# Generate consensus for each condition
tamoxifen.consensus <- dba.peakset(tamoxifen,consensus = DBA_CONDITION, minOverlap=2)

# Examine sample groups consensus peaks and overlap
dba.show(tamoxifen.consensus,tamoxifen.consensus$masks$Consensus)
dba.plotVenn(tamoxifen.consensus,tamoxifen.consensus$masks$Consensus)

# Retrieve union of consensus peaks
consensus <- dba.peakset(tamoxifen.consensus, bRetrieve=TRUE,
peaks=tamoxifen.consensus$masks$Consensus,
minOverlap=1)

# Count using pre-computed consensus peakset
tamoxifen <- dba.count(tamoxifen, peaks=consensus)

0
Entering edit mode

Thank you, this is what we followed in generating the consensus peak set. May we clarify the following with you?

(1) With the minOverlap of 2 within group (above code), do we obtain: merged peak set per group, each merged peak set contains only peaks observed in at least 2 samples in that group, then consensus peak set is the union of the two merged peak sets? Meaning, peaks which show up in group A in at least two samples and in none of the group B samples are still retained in the consensus peak set?

(2) In the dba.plotVenn step above, we see 172,883 peaks overlapped between the two conditions. After running dba.count, the DBA object shows 295,676 Intervals for all samples, which we understand as being the consensus peak set. Is the venn diagram showing an overlap across all samples in the two conditions perhaps (is that why it is lower than 295,676)?

0
Entering edit mode

(1) Yes. You get a union of the consensus sets (peaks included even if they appear in only one set) because minOverlap=1 in the final call.

(2) In the venn diagram, the middle set are those that appear in both consensus sets (minOverlap=2). Since we took a union (minOverlap=1) , the final set is equal to the sum of all three parts of the venn.