I am using DiffBind for differential occupancy analysis for my Chip-seq data set. The problem I am having is that- as of last week, whenever I ran "dba.overlap" function for my 4 samples (2 samples/condition), I would get 4 overlapping rates (82695,51554,34786,24150). However, When I tried running the code today (updated DiffBind to version 2.8.0), I am getting the following numbers. The first two numbers are the same from dba.overlap function.
> olaprate <- dba.overlap(DBdata, mode = DBA_OLAP_RATE)
> olaprate
[1] 51554 51554 34786 24150
> plot(olaprate, type='b', ylab='# peaks', xlab ='overlap at least this meany peaksets')
I also ran the Tamoxifen vignette as well to see if the error is reproduced. Following is what I get from DiffBind for the Tamoxifen data.
> olap.rate <- dba.overlap(tamoxifen,mode=DBA_OLAP_RATE)
> olap.rate
[1] 2845 2845 1773 1388 1074 817 653 484 384 202 129
So even in the vignette the first two numbers are repeating (it should have been 3795 2845 1773 1388 1074 817 653 484 384 202 129).
So I am wondering whether something changed in the package during the update..?
Please help!
Thanks
Yep, you got it. The DBA object that results after the consensus peakset is formed (by calling
dba.count(
)) loses information about peaks that are not in the consensus. As the default is to include all peaks in at least two peaksets, the number of peaks that are in at least one sample and those that overlap at least two samples are the same.Thank you for the explanation!