Question

DiffBind minOverlap parameter

0

Entering edit mode

mademoisElle • 0

@mademoiselle-22194

Last seen 6.3 years ago

Hi, I'm trying to use DiffBind for my ChIPseq analysis, but I encountered the following "problem".

I loaded two patients' groups of samples using dba function and setting minOverlap equal to 2. Same setting also for:

my_peaks <- dba.count(my_peaks, bUseSummarizeOverlaps=F,minOverlap=2).

I defined the contrast and then I ran a differential analysis: my_peaks <- dba.analyze(my_peaks,method=DBA_DESEQ2)

But when I looked into this object: peaks_DF <- dba.report( my_peaks,bCalled=TRUE,th=0.1) I found some peaks present only in 1 one sample for the groupA, and only in one sample for the groupB.

Is there any possibility to avoid this and having only a minOverlap of 2 in each patients' group? I mean at least a minimum of overlap=2 in each considered category (>=2 in the groupA and >=2 in the group B, not >=2 groupA+groupB) to create a consensus on which perform differential analysis.

Thank you in advance for your help!

DiffBind minOverlap consensus peakset Job • 2.5k views

ADD COMMENT • link updated 4.1 years ago by Rory Stark ★ 5.2k • written 6.3 years ago by mademoisElle • 0

score 0 · Answer 1 · 2019-10-23

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 12 months ago

Cambridge, UK

You can make the consensus peakset in two steps: first making a consensus for each sample group with minOverlap=2, then taking the union of these with minOverlap=1.

Here's an example using the sample data, where we get the peaks that overlap in at least two of the Responsive samples and at least two of the Resistant samples, the count using all of these peaks:

# Load peak data
data(tamoxifen_peaks)

# Generate consensus for each condition
tamoxifen.consensus <- dba.peakset(tamoxifen,consensus = DBA_CONDITION, minOverlap=2)

# Examine sample groups consensus peaks and overlap
dba.show(tamoxifen.consensus,tamoxifen.consensus$masks$Consensus)
dba.plotVenn(tamoxifen.consensus,tamoxifen.consensus$masks$Consensus)

# Retrieve union of consensus peaks
consensus <- dba.peakset(tamoxifen.consensus, bRetrieve=TRUE,
                         peaks=tamoxifen.consensus$masks$Consensus,
                         minOverlap=1)

# Count using pre-computed consensus peakset
tamoxifen <- dba.count(tamoxifen, peaks=consensus)

ADD COMMENT • link 6.3 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Thank you, this is what we followed in generating the consensus peak set. May we clarify the following with you?

(1) With the minOverlap of 2 within group (above code), do we obtain: merged peak set per group, each merged peak set contains only peaks observed in at least 2 samples in that group, then consensus peak set is the union of the two merged peak sets? Meaning, peaks which show up in group A in at least two samples and in none of the group B samples are still retained in the consensus peak set?

(2) In the dba.plotVenn step above, we see 172,883 peaks overlapped between the two conditions. After running dba.count, the DBA object shows 295,676 Intervals for all samples, which we understand as being the consensus peak set. Is the venn diagram showing an overlap across all samples in the two conditions perhaps (is that why it is lower than 295,676)?

Thank you for your help in advance!

ADD REPLY • link 4.2 years ago pratima8589 • 0

0

Entering edit mode

(1) Yes. You get a union of the consensus sets (peaks included even if they appear in only one set) because minOverlap=1 in the final call.

(2) In the venn diagram, the middle set are those that appear in both consensus sets (minOverlap=2). Since we took a union (minOverlap=1) , the final set is equal to the sum of all three parts of the venn.

ADD REPLY • link 4.1 years ago Rory Stark ★ 5.2k