Counting reads in fixed width peaks with DiffBind::dba.count without re-centering?
1
1
Entering edit mode
Ni-Ar ▴ 10
@ni-ar-15663
Last seen 14 months ago
Spain/Barcelona

Hello,

I'm using DiffBind version 2.6.6 to do ChIP-seq differential binding analysis for a transcription factor that binds at genes transcription start sites (TSS). 

I generated the peaks intervals with MACS2 and since I'm specifically interested only in TSS regions I used bedtools interesect  to intersect the TF peaks with a bed file of TSS +/- 1250bp. In this way I generate a bed file with peaks that are centred on the TSS and all have 2500bp width.

Now, when I move to DiffBind for the differential analysis I used the function dba.count() with the parameter summits set to 1250 since I've read that in this way I can:

  1. Keep all the peaks with the same width
  2. Re-center each peak around the point of greatest enrichment

I'm happy with the fact that the peaks maitain the same width but is it possible instead NOT to re-center the peaks but keep the original peaks coordinates I input to dba.count()The re-centering does not drastically change the start and end position of the peaks, usually it moves them around of 5-10 bp centered on the peak summit instead of the TSS.

For example there's a peak with coordinates

chr6 91472296 91474796

that after using dba.count(dba, summits = 1250) it becomes

chr6 91472294 91474794

Here is how I create the dba object:

Peaks <- "~/TSS_intersected/All_TSS_Intersect_peaks_q0.001_2500bp.bed"

metadata <- data.frame(SampleID, ControlID, Tissue, Factor, Condition,
                       Replicate, bamReads, bamControl, Peaks, PeakCaller)

config_df <- data.frame(RunParallel = TRUE, reportInit = "TF_DBA",
                     DataType = DBA_DATA_FRAME, AnalysisMethod = DBA_EDGER,
                     minQCth = 10, fragmentSize = 250, bCorPlot = TRUE,
                     th = 0.1, bUsePval = FALSE, bFullLibrarySize = TRUE,
                     bReduceObjects = F, bUseSummarizeOverlaps = F)

dba <- dba(sampleSheet = metadata, config = config_df)

dba_cnt <- dba.count(dba, summits = 1250)

I'm interested in this because I'd like to compared my dataset with another TF ChiP-seq, so I care to have both peak-sets with the same coordinates. 

Thank you!

diffbind chipseq counts peaks bedtools • 694 views
ADD COMMENT
1
Entering edit mode
Rory Stark ★ 3.9k
@rory-stark-5741
Last seen 37 minutes ago
CRUK, Cambridge, UK

If you do not want to re-center the peaks, there is no need to use the summits parameter. You can use the peaks parameter to specify the exact intervals you want:

> dba_cnt <- dba.count(dba, peaks=Peaks)

Cheers-

Rory

ADD COMMENT
0
Entering edit mode

Hi Rory-

I realize this is an old post, but I have a related question and stumbled across this answer. I'm wanting to allow variable-width peaks (I'd like to use the same intervals as specified by the MACS2 narrowPeak files I'm using, with merging across replicatesby DiffBind). I've tried dba_cnt <- dba.count(dba, peaks=Peaks) but I get the error "Error in dba.count(dba, peaks = Peaks) : object 'Peaks' not found". Peaks is a column in my samples file with paths to the narrowPeak files. What is Peaks meant to be?

Thanks very much!

ADD REPLY
0
Entering edit mode

You don't need to specify anything for the peaks parameter; the generation of a consensus peakset consisting of variable length, merged peaks from specified MACS files is the default behavior, so long as your also specify summits=FALSE (which prevents the peaks from being reset to consistent intervals around the consensus summit).

The peaks= parameter is for when you have a pre-defined set of peaks to want to pass in so as to override the generation of a consensus peakset via overlaps.

ADD REPLY

Login before adding your answer.

Traffic: 460 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6