Question

how diffbind count the reads of each peak in each sample?

0

Entering edit mode

hfyuan2016 • 0

@hfyuan2016-23538

Last seen 8 months ago

Australia

Hi , I am currently working on 12 ATAC-Seq samples using Diffbind. Following the tutorial, the first step involved generating a dba object and setting the minoverlap parameter to 2, which resulted in obtaining 60k peaks. The next step involved counting the reads of each peak in each sample. However, I have noticed that some peaks appear to have a high count number despite not being included in my input peak list. Can someone explain the reason for this discrepancy? For example, from the count table, I can see peak_1 has 120 counts in the sample A, but when I go back to check the input peakset of sample A, there is not peak_1.

library(DiffBind)
dataOb <- dba(sampleSheet = "table.samplesheet.csv", minOverlap=2)
dataOb <- dba.count(dataOb, summits=FALSE, minOverlap=2, 
                   score=DBA_SCORE_NORMALIZED,
                    bUseSummarizeOverlaps = TRUE)
write.csv(dataOb[["binding"]], file="peaks.counts.csv")

Many thanks Huifang

diffbind DiffBind • 843 views

ADD COMMENT • link updated 19 months ago by Rory Stark ★ 5.2k • written 19 months ago by hfyuan2016 • 0

score 2 · Accepted Answer · 2023-04-21

Peak calling is quite inexact and involves setting some kind of threshold. The binding matrix counts reads overlapping consensus peaks in all samples, whether or not the consensus peak was called in any particular sample. It is not unusual to find substantial signal in a sample where a consensus peak was not called.

You may also want to check how you are verifying that the peak was called in a sample, as the consensus (merged) peaks do not match the original peaks in the case where multiple called peaks overlap. There's also an option in dba.report() to include information regarding the peak calling status of the consensus peaks across each of the samples.