Question

DiffBind - Difference in counting methodology when summits option in dba.count is on?

0

Entering edit mode

julien.bryois • 0

@julienbryois-11773

Last seen 7.1 years ago

Hi,

I wanted to quantify a new sample at peak coordinates identified previously (using the summit option in dba.count). I noticed that the peak quantification using the previously identified coordinates were very different from the peak quantification that I obtained with the summit option in dba.count. Is the counting done differently when the summit option is on? Is there any way to quantify peak coordinates in the same way as when the summit option is on using peak coordinates as input?

Please see code below to reproduce my observation.

Thank you!

Best,

Julien

library("DiffBind”)
packageVersion("DiffBind")
#[1] ‘2.0.9’

meta_filt <- read.table("meta_2_replicates.txt",header=T)

d <- dba(sampleSheet=meta_filt,minOverlap=2,config=data.frame(RunParallel=TRUE, reportInit="DBA", DataType=DBA_DATA_GRANGES, AnalysisMethod=DBA_DESEQ2, minQCth=15, fragmentSize=300,bCorPlot=FALSE, th=0.05, bUsePval=FALSE))
d <- dba.count(d,summits=150,fragmentSize=300)

#get raw read counts and peak coordinates

d <- dba.count(d, peaks=NULL, score=DBA_SCORE_READS)
counts_raw <- dba.peakset(d, bRetrieve=TRUE, DataType=DBA_DATA_FRAME)
counts_coordinate <- counts_raw[c(1:3)]
counts_coordinate$ID <- paste(counts_raw$CHR,counts_raw$START,counts_raw$END,sep="\t")

write.table(counts_coordinate,"peak_coordinate_two_sample_test.bed",col.names=F,row.names=F,sep="\t",quote=F)


### Peak quantification same peaks same samples but without summit option

meta_filt$Peaks <- "peak_coordinate_two_sample_test.bed"

d2 <-dba(sampleSheet=meta_filt,minOverlap=1,scoreCol=0,config=data.frame(RunParallel=TRUE, reportInit="DBA", DataType=DBA_DATA_GRANGES, AnalysisMethod=DBA_DESEQ2, minQCth=15, fragmentSize=300,bCorPlot=FALSE, th=0.05, bUsePval=FALSE))

d2 <- dba.count(d2,fragmentSize=300)

#get raw read counts and peak coordinates
d2 <- dba.count(d2, peaks=NULL, score=DBA_SCORE_READS)
counts_raw_2 <- dba.peakset(d2, bRetrieve=TRUE, DataType=DBA_DATA_FRAME)

head(counts_raw)

    CHR     START       END B16T1 B1T1

1 chr1     10169     10469   353   94
2 chr1  31904657  31904957   200   40
3 chr1  85742198  85742498   338   40
4 chr1  87379924  87380224   281   38
5 chr1 142967288 142967588   313  184
6 chr1 143543782 143544082   438  114

head(counts_raw_2)

   CHR     START       END B16T1 B1T1
1 chr1     10169     10469    17    7
2 chr1  31904657  31904957    37    9
3 chr1  85742198  85742498   338   40
4 chr1  87379924  87380224   273   38
5 chr1 142967288 142967588    10    5
6 chr1 143543782 143544082   164   61

diffbind dba.count summits • 1.7k views

ADD COMMENT • link updated 7.3 years ago by Rory Stark ★ 5.2k • written 7.4 years ago by julien.bryois • 0

score 0 · Answer 1 · 2017-01-20

Hello Julien-

I finally got a chance to look at this in depth, and indeed you discovered a bug. I have checked in a fix that will appear soon as DiffBind 2.2.8.

The bug was in the code that runs after the "Re-centering peaks" message. The mapping quality filter (mapQCth), which is set to 15 by default, was not being propagated, so all reads were being counted (including, eg, multi-mapping reads). So your second approach, where you passed the re-centered peaks in explicitly, has the correct counts. You can accomplish a similar work-around as follows:

>  d <- dba.count(d,summits=TRUE,fragmentSize=300)
>  d <- dba.count(d,peaks=NULL,summits=150,score=DBA_SCORE_READS)

Cheers-

Rory