Question: DiffBind merging and consensus peaks
0
3.0 years ago by
omiguele0
omiguele0 wrote:

Dear Bioconductor community,

I am interested in using DiffBind. I am following the procedure in "DiffBind: Differential binding analysis of ChIP-Seq peak data" and I got a bit confused, I hope you can help me.

My first question is a bit general, if I understand correctly DiffBind from the very beggining (reading the peaksets) only takes into account the peaks that are merged (shared) between all the samples. So if there were two expeimental conditions and two replicates per condition and a peak was consistently found in the two condition1 replicates but not in the replicates for condition2, this peak would not be taken into account, is that correct?

My second question is in the "Deriving consensus peaksets" part,in page 20, the line says:

"Alternatively, a master consensus peakset could be generated, and reads counted, directly using dba.count: tamoxifen
<- dba.count(tamoxifen, peaks=tamoxifen$masks$Consensus)"

if I try this I receive the next error:

"Error in pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore = score,  :
Can't count: some peaksets are not associated with a .bam file."

I have my consensus peak lines (replicates) in the dba object but there are no BAM files associated in the original "sampleSheet". Would you recomend to merge the BAM files and upload a new sample sheet?

Regards,

diffbind • 1.9k views
modified 3.0 years ago by Gord Brown590 • written 3.0 years ago by omiguele0
1

Regarding the second issue, there was an error in that section of the Vignette. I have fixed the text, explaining more clearly how to count with a separately constructed consensus peakset, and it should be released soon as DiffBind 2.2.6.

-R

Answer: DiffBind merging and consensus peaks
2
3.0 years ago by
Gord Brown590
United Kingdom
Gord Brown590 wrote:

Hi,

In regard to your first question, you can control how many peak sets have to include a peak for it to be included in the analysis.  In both dba and dba.count, the parameter minOverlap controls this: if for example you supply the argument minOverlap=2, then any peak that occurs in at least 2 peak sets will be included.

I'll have to leave the second part to Rory... I don't really understand what he is (or you are) trying to accomplish there.

Cheers,

- Gord

Hi Dr.Brown,

Thank you for your answer, maybe I am confused about the merging concept. For example in the tamoxifen dataset once the peakset is loaded, the first line of the dba object says:

“11 Samples, 2603 sites in matrix (3558 total)”

2603 sites are the ones shared by at least two of the 11 datasets (minOverlap=2), but if I use minOverlap=0 (or minOverlap=1) I will have the 3558 sites, because those are all the available sites. But 3558 is not equivalent to the sum of the intervals in the 11 samples, this happens because you are making a “merge” (like a bedtools merge) for every single one of the 11 samples independently?

Regards,

Oscar Migueles

2

The merging process is described in section 7.2 of the package vignette. Peaks that overlap by at least one base between samples are "widened" to encompass the entire enriched region. We recommend using the summits parameter in dba.count() to center the peaks on the consensus summit and make the uniform width.

Regarding the second issue, I'll look into this further in the next day or so.

1

Just to clarify, we run (the equivalent of) a bedtools merge on all of the samples together, not independently.  Then count how many samples contributed to each (merged) peak.  If that number is at least minOverlap, the peak is included.

Thank you for all your help,

Regards,

Oscar Migueles