Question: Counting reads in fixed width peaks with DiffBind::dba.count without re-centering?
1
19 months ago by
Ni-Ar0
Germany
Ni-Ar0 wrote:

Hello,

I'm using DiffBind version 2.6.6 to do ChIP-seq differential binding analysis for a transcription factor that binds at genes transcription start sites (TSS).

I generated the peaks intervals with MACS2 and since I'm specifically interested only in TSS regions I used bedtools interesect  to intersect the TF peaks with a bed file of TSS +/- 1250bp. In this way I generate a bed file with peaks that are centred on the TSS and all have 2500bp width.

Now, when I move to DiffBind for the differential analysis I used the function dba.count() with the parameter summits set to 1250 since I've read that in this way I can:

1. Keep all the peaks with the same width
2. Re-center each peak around the point of greatest enrichment

I'm happy with the fact that the peaks maitain the same width but is it possible instead NOT to re-center the peaks but keep the original peaks coordinates I input to dba.count()The re-centering does not drastically change the start and end position of the peaks, usually it moves them around of 5-10 bp centered on the peak summit instead of the TSS.

For example there's a peak with coordinates

chr6 91472296 91474796


that after using dba.count(dba, summits = 1250) it becomes

chr6 91472294 91474794


Here is how I create the dba object:

Peaks <- "~/TSS_intersected/All_TSS_Intersect_peaks_q0.001_2500bp.bed"

metadata <- data.frame(SampleID, ControlID, Tissue, Factor, Condition,
                       Replicate, bamReads, bamControl, Peaks, PeakCaller)

config_df <- data.frame(RunParallel = TRUE, reportInit = "TF_DBA",
DataType = DBA_DATA_FRAME, AnalysisMethod = DBA_EDGER,
minQCth = 10, fragmentSize = 250, bCorPlot = TRUE,
th = 0.1, bUsePval = FALSE, bFullLibrarySize = TRUE,
bReduceObjects = F, bUseSummarizeOverlaps = F)

dba <- dba(sampleSheet = metadata, config = config_df)

dba_cnt <- dba.count(dba, summits = 1250)

I'm interested in this because I'd like to compared my dataset with another TF ChiP-seq, so I care to have both peak-sets with the same coordinates.

Thank you!

modified 18 months ago by Rory Stark3.0k • written 19 months ago by Ni-Ar0
1
18 months ago by
Rory Stark3.0k
CRUK, Cambridge, UK
Rory Stark3.0k wrote:

If you do not want to re-center the peaks, there is no need to use the summits parameter. You can use the peaks parameter to specify the exact intervals you want:

> dba_cnt <- dba.count(dba, peaks=Peaks)

Cheers-

Rory