DiffBind: dba.overlap error
1
0
Entering edit mode
@romicakerketta-14630
Last seen 7 weeks ago
United States

I am using DiffBind for differential occupancy analysis for my Chip-seq data set. The problem I am having is that- as of last week, whenever I ran "dba.overlap" function for my 4 samples (2 samples/condition), I would get 4 overlapping rates (82695,51554,34786,24150). However, When I tried running the code today (updated DiffBind to version 2.8.0), I am getting the following numbers. The first two numbers are the same from dba.overlap function.

> olaprate <- dba.overlap(DBdata, mode = DBA_OLAP_RATE)
> olaprate
[1] 51554 51554 34786 24150
> plot(olaprate, type='b', ylab='# peaks', xlab ='overlap at least this meany peaksets')

I also ran the Tamoxifen vignette as well to see if the error is reproduced. Following is what I get from DiffBind for the Tamoxifen data.            

> olap.rate <- dba.overlap(tamoxifen,mode=DBA_OLAP_RATE)
> olap.rate
 [1] 2845 2845 1773 1388 1074  817  653  484  384  202  129

So even in the vignette the first two numbers are repeating (it should have been 3795 2845 1773 1388 1074  817  653  484  384  202  129). 

So I am wondering whether something changed in the package during the update..?

Please help!

Thanks 

DiffBind dba.overlap • 433 views
ADD COMMENT
0
Entering edit mode
@romicakerketta-14630
Last seen 7 weeks ago
United States

I guess I figured out the solution to my problem. I had to give a different name when DiffBind reads in the occupancy data and then use a different name when DiffBind reads in counts. Following code worked:

DBdata_peak <- dba(sampleSheet=ikras)
DBdata_peak

#creat count 
DBdata <- dba.count(DBdata_peak)
DBdata

And then using DBdata_peak dataset in the dba.overlap gave me the correct numbers: 

> olaprate <- dba.overlap(DBdata_peak, mode = DBA_OLAP_RATE)
> olaprate
[1] 82695 51554 34786 24150

 

 

 

 

 

ADD COMMENT
0
Entering edit mode

Yep, you got it. The DBA object that results after the consensus peakset is formed (by calling dba.count()) loses information about peaks that are not in the consensus. As the default is to include all peaks in at least two peaksets, the number of peaks that are in at least one sample and those that overlap at least two samples are the same.

ADD REPLY
0
Entering edit mode

Thank you for the explanation!

ADD REPLY

Login before adding your answer.

Traffic: 477 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6