Question

Counting reads in fixed width peaks with DiffBind::dba.count without re-centering?

1

Entering edit mode

Ni-Ar ▴ 10

@ni-ar-15663

Last seen 6.0 years ago

Spain/Barcelona

Hello,

I'm using DiffBind version 2.6.6 to do ChIP-seq differential binding analysis for a transcription factor that binds at genes transcription start sites (TSS).

I generated the peaks intervals with MACS2 and since I'm specifically interested only in TSS regions I used bedtools interesect to intersect the TF peaks with a bed file of TSS +/- 1250bp. In this way I generate a bed file with peaks that are centred on the TSS and all have 2500bp width.

Now, when I move to DiffBind for the differential analysis I used the function dba.count() with the parameter summits set to 1250 since I've read that in this way I can:

Keep all the peaks with the same width
Re-center each peak around the point of greatest enrichment

I'm happy with the fact that the peaks maitain the same width but is it possible instead NOT to re-center the peaks but keep the original peaks coordinates I input to dba.count()? The re-centering does not drastically change the start and end position of the peaks, usually it moves them around of 5-10 bp centered on the peak summit instead of the TSS.

For example there's a peak with coordinates

chr6 91472296 91474796

that after using dba.count(dba, summits = 1250) it becomes

chr6 91472294 91474794

Here is how I create the dba object:

Peaks <- "~/TSS_intersected/All_TSS_Intersect_peaks_q0.001_2500bp.bed"

metadata <- data.frame(SampleID, ControlID, Tissue, Factor, Condition,
                       Replicate, bamReads, bamControl, Peaks, PeakCaller)

config_df <- data.frame(RunParallel = TRUE, reportInit = "TF_DBA",
                     DataType = DBA_DATA_FRAME, AnalysisMethod = DBA_EDGER,
                     minQCth = 10, fragmentSize = 250, bCorPlot = TRUE,
                     th = 0.1, bUsePval = FALSE, bFullLibrarySize = TRUE,
                     bReduceObjects = F, bUseSummarizeOverlaps = F)

dba <- dba(sampleSheet = metadata, config = config_df)

dba_cnt <- dba.count(dba, summits = 1250)

I'm interested in this because I'd like to compared my dataset with another TF ChiP-seq, so I care to have both peak-sets with the same coordinates.

Thank you!

diffbind chipseq counts peaks bedtools • 3.8k views

ADD COMMENT • link updated 4.8 years ago by Rory Stark ★ 5.3k • written 7.8 years ago by Ni-Ar ▴ 10

score 1 · Answer 1 · 2018-05-23

1

Entering edit mode

Rory Stark ★ 5.3k

@rory-stark-5741

Last seen 13 months ago

Cambridge, UK

If you do not want to re-center the peaks, there is no need to use the summits parameter. You can use the peaks parameter to specify the exact intervals you want:

> dba_cnt <- dba.count(dba, peaks=Peaks)

Cheers-

Rory

ADD COMMENT • link 7.7 years ago Rory Stark ★ 5.3k

0

Entering edit mode

Hi Rory-

I realize this is an old post, but I have a related question and stumbled across this answer. I'm wanting to allow variable-width peaks (I'd like to use the same intervals as specified by the MACS2 narrowPeak files I'm using, with merging across replicatesby DiffBind). I've tried dba_cnt <- dba.count(dba, peaks=Peaks) but I get the error "Error in dba.count(dba, peaks = Peaks) : object 'Peaks' not found". Peaks is a column in my samples file with paths to the narrowPeak files. What is Peaks meant to be?

Thanks very much!

ADD REPLY • link 4.8 years ago Beryl • 0

0

Entering edit mode

You don't need to specify anything for the peaks parameter; the generation of a consensus peakset consisting of variable length, merged peaks from specified MACS files is the default behavior, so long as your also specify summits=FALSE (which prevents the peaks from being reset to consistent intervals around the consensus summit).

The peaks= parameter is for when you have a pre-defined set of peaks to want to pass in so as to override the generation of a consensus peakset via overlaps.

ADD REPLY • link 4.8 years ago Rory Stark ★ 5.3k