Search
Question: DiffBind - Difference in counting methodology when summits option in dba.count is on?
0
gravatar for julien.bryois
11 months ago by
julien.bryois0 wrote:

Hi,

I wanted to quantify a new sample at peak coordinates identified previously (using the summit option in dba.count). I noticed that the peak quantification using the previously identified coordinates were very different from the peak quantification that I obtained with the summit option in dba.count. Is the counting done differently when the summit option is on? Is there any way to quantify peak coordinates in the same way as when the summit option is on using peak coordinates as input?

Please see code below to reproduce my observation.

Thank you!

Best,

Julien

 

library("DiffBind”)
packageVersion("DiffBind")
#[1] ‘2.0.9’

meta_filt <- read.table("meta_2_replicates.txt",header=T)

d <- dba(sampleSheet=meta_filt,minOverlap=2,config=data.frame(RunParallel=TRUE, reportInit="DBA", DataType=DBA_DATA_GRANGES, AnalysisMethod=DBA_DESEQ2, minQCth=15, fragmentSize=300,bCorPlot=FALSE, th=0.05, bUsePval=FALSE))
d <- dba.count(d,summits=150,fragmentSize=300)

#get raw read counts and peak coordinates

d <- dba.count(d, peaks=NULL, score=DBA_SCORE_READS)
counts_raw <- dba.peakset(d, bRetrieve=TRUE, DataType=DBA_DATA_FRAME)
counts_coordinate <- counts_raw[c(1:3)]
counts_coordinate$ID <- paste(counts_raw$CHR,counts_raw$START,counts_raw$END,sep="\t")

write.table(counts_coordinate,"peak_coordinate_two_sample_test.bed",col.names=F,row.names=F,sep="\t",quote=F)


### Peak quantification same peaks same samples but without summit option

meta_filt$Peaks <- "peak_coordinate_two_sample_test.bed"

d2 <-dba(sampleSheet=meta_filt,minOverlap=1,scoreCol=0,config=data.frame(RunParallel=TRUE, reportInit="DBA", DataType=DBA_DATA_GRANGES, AnalysisMethod=DBA_DESEQ2, minQCth=15, fragmentSize=300,bCorPlot=FALSE, th=0.05, bUsePval=FALSE))

d2 <- dba.count(d2,fragmentSize=300)

#get raw read counts and peak coordinates
d2 <- dba.count(d2, peaks=NULL, score=DBA_SCORE_READS)
counts_raw_2 <- dba.peakset(d2, bRetrieve=TRUE, DataType=DBA_DATA_FRAME)

head(counts_raw)

    CHR     START       END B16T1 B1T1

1 chr1     10169     10469   353   94
2 chr1  31904657  31904957   200   40
3 chr1  85742198  85742498   338   40
4 chr1  87379924  87380224   281   38
5 chr1 142967288 142967588   313  184
6 chr1 143543782 143544082   438  114

head(counts_raw_2)

   CHR     START       END B16T1 B1T1
1 chr1     10169     10469    17    7
2 chr1  31904657  31904957    37    9
3 chr1  85742198  85742498   338   40
4 chr1  87379924  87380224   273   38
5 chr1 142967288 142967588    10    5
6 chr1 143543782 143544082   164   61
ADD COMMENTlink modified 10 months ago by Rory Stark2.1k • written 11 months ago by julien.bryois0
0
gravatar for Rory Stark
10 months ago by
Rory Stark2.1k
CRUK, Cambridge, UK
Rory Stark2.1k wrote:

Hello Julien-

I finally got a chance to look at this in depth, and indeed you discovered a bug. I have checked in a fix that will appear soon as DiffBind 2.2.8.

The bug was in the code that runs after the "Re-centering peaks" message. The mapping quality filter (mapQCth), which is set to 15 by default, was not being propagated, so all reads were being counted (including, eg, multi-mapping reads). So your second approach, where you passed the re-centered peaks in explicitly, has the correct counts. You can accomplish a similar work-around as follows:

>  d <- dba.count(d,summits=TRUE,fragmentSize=300)
>  d <- dba.count(d,peaks=NULL,summits=150,score=DBA_SCORE_READS)

Cheers-

Rory

ADD COMMENTlink modified 10 months ago • written 10 months ago by Rory Stark2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 144 users visited in the last hour