dba.count error in DiffBind
1
2
Entering edit mode
shohei.hori ▴ 20
@shoheihori-11109
Last seen 5.3 years ago

Hi,

I have been using DiffBind to perform differential binding analysis on ChIP-seq data, comparing two groups with 3 replicates for each. I am now facing an error when using its dba.count function.

I created a DBA object, Expt, using a sample sheet.

Expt <- dba(sampleSheet=”Expt.csv”)

After creating consensus peaksets by

Expt <- dba.peakset(Expt, consensus = -DBA_REPLICATE)

I used the dba.count function as follows.

Expt <- dba.count(Expt, peaks=Expt$masks$Consensus)

I then got an error message as follows.

Error in pv.counts(DBA, peaks = peaks, minOverlap = minOverlap,defaultScore = score,

Can’t count: some peaksets are not associated with a .bam file.

What is weird is that, when I used dba.count function on all the samples, without creating consensus peaksets, it works without showing any error message. In addition, this error message did not appear, when I used an older version of DiffBind. This error happens after I have updated R to ver. 3.3.1 and downloaded the latest version of DiffBind.

Thank you so much.

diffbind • 2.2k views
1
Entering edit mode
Rory Stark ★ 4.1k
@rory-stark-5741
Last seen 16 days ago
CRUK, Cambridge, UK

It looks like you are trying to count reads against a consensus peaksets made up of peaks that are in at least two replicates of at least two sample groups, is that correct? So if you have two sample groups (say WT and AT), and three replicates of each, the call to dba.peakset() with consensus = -DBA_REPLICATE will create two consensus peaksets, one for each sample group, each including peaks that overlap in at least two replicates. When you call dba.count(), you want to get peaks that overlap in both of these consensus sets, is that right?

The way to do this requires a few more steps. First, generate the sample-group consensus sets as you have done, but store it in a new object:

> ExptConsensus <-  dba.peakset(Expt, consensus = -DBA_REPLICATE)

Next limit this to only the consensus peaks:

> ExptConsensus <-  dba(ExptConsensus, mask=ExptConsensus$masks$Consensus)

Now retrieve the consensus of these:

> ConsensusPeaks <- dba.peakset(ExptConsensus, bRetrieve=TRUE)

And pass that in to dba.count():

> Expt <- dba.count(Expt, peaks=ConsensusPeaks)

The key is that when you use dba.peakset() with consensus = -DBA_REPLICATE, it adds new peaksets to the DBA object. These peaksets are not associated with an individual sample, but rather formed from a consensus of multiple samples. Therefore it doesn't make sense to use dba.count() to count reads, as reads are associated with individual samples. The bottom line is that each sample in the DBA object being counted needs to be associated with a set of reads (a bam file); a consensus peakset by itself is no longer associated with any one bam file.

Hope this helps-

Rory

0
Entering edit mode

A couple more thoughts on this:

First, I looked through the vignette and I see there is a reference to accomplishing this in exactly the way you originally tried:

Expt <- dba.count(Expt, peaks=Expt$masks$Consensus)

I'll fix this reference in the next release.

The other issue is that limiting the consensus set to sites that are identified in at least two replicates of both conditions may be too restrictive, depending on what questions you are trying answer. It would reasonable to expect that you would exclude many potentially interesting differentially bound sites this way. If you want to include sites that are identified in at least two replicates of either sample group, set minOverlap=1 in the second line of code in my original answer:

> ExptConsensus <-  dba(ExptConsensus, mask=ExptConsensus$masks$Consensus, minOverlap=1)

Cheers-

Rory

0
Entering edit mode

Thank you very much for the detailed comments, which are very helpful. It works now. What I wanted to try was to have a consensus peak set that corresponds to the sites identified in at least two replicates of either sample group, as you recommended. So, I will set minOverlap=1 in that command line.

Best regards,

Shohei

0
Entering edit mode

Hello,

I have another question related to this. When I used the dba.count function as you suggested using an older version (1.12.2) of DiffBind, I got warning messages as follows.

In sampvec | pv.whichCalled(spare, samp, masternum): the length of the longer object is not a multiple of that of the shorter one. (This is translated from the message in the Japanese version of R, so it may be different in the English version...)

Despite those messages, the dba.count function apparently worked because I was able to perform downstream analyses such as dba.contrast and dba.analyze functions... In the latest version of DiffBind, such warning messages did not appear.

I wonder whether I can ignore those warning messages and, if not, what should I do to avoid them.

Thank you very much for your kind help.

Shohei

0
Entering edit mode

Hi Shohei-

There was a bug in DiffBind 1.12 that has been fixed. It does not impact analyses, but could impact viewing which samples had peaks called (bCalled=TRUE and/or bCalledDetail=TRUE in dba.report()) or looking at some overlaps (ie dba.plotVenn() after counting).

So yes you you can ignore. I do advise updating if you can as 1.12 is at least three version old and no longer really supported.

-Rory

0
Entering edit mode

Hi Rory,

Thank you very much for your prompt response. I am relieved to know that the analyses were not affected by the bug. I will use the latest version of DiffBind.

Shohei