Hello,
I'm using DiffBind
version 2.6.6 to do ChIP-seq differential binding analysis for a transcription factor that binds at genes transcription start sites (TSS).
I generated the peaks intervals with MACS2 and since I'm specifically interested only in TSS regions I used bedtools interesect
to intersect the TF peaks with a bed file of TSS +/- 1250bp. In this way I generate a bed file with peaks that are centred on the TSS and all have 2500bp width.
Now, when I move to DiffBind
for the differential analysis I used the function dba.count()
with the parameter summits
set to 1250 since I've read that in this way I can:
- Keep all the peaks with the same width
- Re-center each peak around the point of greatest enrichment
I'm happy with the fact that the peaks maitain the same width but is it possible instead NOT to re-center the peaks but keep the original peaks coordinates I input to dba.count()
? The re-centering does not drastically change the start and end position of the peaks, usually it moves them around of 5-10 bp centered on the peak summit instead of the TSS.
For example there's a peak with coordinates
chr6 91472296 91474796
that after using dba.count(dba, summits = 1250)
it becomes
chr6 91472294 91474794
Here is how I create the dba object:
Peaks <- "~/TSS_intersected/All_TSS_Intersect_peaks_q0.001_2500bp.bed"
metadata <- data.frame(SampleID, ControlID, Tissue, Factor, Condition,
Replicate, bamReads, bamControl, Peaks, PeakCaller)
config_df <- data.frame(RunParallel = TRUE, reportInit = "TF_DBA",
DataType = DBA_DATA_FRAME, AnalysisMethod = DBA_EDGER,
minQCth = 10, fragmentSize = 250, bCorPlot = TRUE,
th = 0.1, bUsePval = FALSE, bFullLibrarySize = TRUE,
bReduceObjects = F, bUseSummarizeOverlaps = F)
dba <- dba(sampleSheet = metadata, config = config_df)
dba_cnt <- dba.count(dba, summits = 1250)
I'm interested in this because I'd like to compared my dataset with another TF ChiP-seq, so I care to have both peak-sets with the same coordinates.
Thank you!
Hi Rory-
I realize this is an old post, but I have a related question and stumbled across this answer. I'm wanting to allow variable-width peaks (I'd like to use the same intervals as specified by the MACS2 narrowPeak files I'm using, with merging across replicatesby DiffBind). I've tried dba_cnt <- dba.count(dba, peaks=Peaks) but I get the error "Error in dba.count(dba, peaks = Peaks) : object 'Peaks' not found". Peaks is a column in my samples file with paths to the narrowPeak files. What is Peaks meant to be?
Thanks very much!
You don't need to specify anything for the
peaks
parameter; the generation of a consensus peakset consisting of variable length, merged peaks from specified MACS files is the default behavior, so long as your also specifysummits=FALSE
(which prevents the peaks from being reset to consistent intervals around the consensus summit).The
peaks=
parameter is for when you have a pre-defined set of peaks to want to pass in so as to override the generation of a consensus peakset via overlaps.