DiffBind: 0 consensus peaks for dba.peakset: New bug / undocumented change introduced between version 2.14 and 3.4.11
2
0
Entering edit mode
chrarnold • 0
@chrarnold-7603
Last seen 4 weeks ago
Germany

We have a data pipeline that served us well for years that includes DiffBind, which generates consensus peak files with different minOverlap values for a set of peak files. The code used in DiffBind is very simple, see below, and worked flawlessly so far. We however recently updated our Singularity image, and we now work with a newer DiffBind package version. Unfortunately, the new(er) versions either introduce a new bug (see below) or at least an undocumented and hard to understand behaviour that results in 0 consensus peak sets. I have reproducible code that runs just fine when running with DIffBind 2.14, but results in 0 consensus peaks with a newer DiffBind version. Can someone help?

I can provide the rds file of the dba object if needed for both R versions (they seem to be a bit different, as I get an error when I try to lrun a DiffBind v3 object within DiffBind v2.


> R.version.string
[1] "R version 3.6.2 (2019-12-12)"
> packageVersion("DiffBind")
[1] ‘2.14.0’

dba.peakset(dba, minOverlap = 1)

9 Samples, 33498 sites in matrix:
ID Caller Intervals
1               1 narrow     18881
2               2 narrow     15370
3               3 narrow     16446
4               4 narrow     13283
5               5 narrow     20127
6               6 narrow     15987
7               7 narrow     21918
8               8 narrow     15457
9 1-2-3-4-5-6-7-8 narrow     33498

> R.version.string
[1] "R version 4.1.2 (2021-11-01)"
>packageVersion("DiffBind")
[1] ‘3.4.11’

dba.peakset(dba, minOverlap = 1)

9 Samples, 22954 sites in matrix (33498 total):
1  1 narrow     18881    NA
2  2 narrow     15370    NA
3  3 narrow     16446    NA
4  4 narrow     13283    NA
5  5 narrow     20127    NA
6  6 narrow     15987    NA
7  7 narrow     21918    NA
8  8 narrow     15457    NA
9  9                0     0


The problem here is the 9th and last row, which (used to) stores the consensus peakset. It is 0 for the new version.

An rds version of the dba object for reproducing the issue can be downloaded here: Hyperlink removed, as issue is solved

DiffBind • 153 views
0
Entering edit mode
Rory Stark ★ 4.4k
@rory-stark-5741
Last seen 13 days ago
CRUK, Cambridge, UK

The object in the .rds file, sampleMetaData.df, is already a DBA object, with eight samples:

> sampleMetaData.df <- readRDS("~/Downloads/DiffBind_issue_dba.obj_R4.1.2.rds")
8 Samples, 22954 sites in matrix (33498 total):
ID Intervals
1  1     18881
2  2     15370
3  3     16446
4  4     13283
5  5     20127
6  6     15987
7  7     21918
8  8     15457


However you can not pass in a DBA object as a sample sheet:

> mydba <- dba(sampleSheet = sampleMetaData.df)
> Error in 1:nrow(samples) : argument of length 0


Using it directly as a DBA object, I can reproduce what you are seeing. I'll take you word that it worked previously to generate a consensus peakset, but that is not how it is documented. As specified, the minOverlap parameter only comes into effect when either a) the consensus parameter is specified

> dba.peakset(sampleMetaData.df, minOverlap = 1, consensus=DBA_CALLER)
9 Samples, 33498 sites in matrix:
ID Intervals
1   1     18881
2   2     15370
3   3     16446
4   4     13283
5   5     20127
6   6     15987
7   7     21918
8   8     15457
9 ALL     33498


or b) when the peaks parameter is specified as a vector of sample numbers:

> dba.peakset(sampleMetaData.df, minOverlap = 1, peaks=1:8)
9 Samples, 33498 sites in matrix:
ID Intervals
1               1     18881
2               2     15370
3               3     16446
4               4     13283
5               5     20127
6               6     15987
7               7     21918
8               8     15457
9 1-2-3-4-5-6-7-8     33498


The latter case gives you what you want.

It is possible that the behavior has changed between DiffBind_2 and DiffBind_3 (which is one of the reasons for a major version bump), however it is currently working as documented (with the behavior of dba.peakset() being undefined when minOverlap is specified but not either consensus or peaks).

0
Entering edit mode
chrarnold • 0
@chrarnold-7603
Last seen 4 weeks ago
Germany

Thanks a lot for your quick reply, it is very appreciated! Indeed, with your 2 suggested modifications, I can again reproduce the results from before, perfect. A couple of clarifications and suggestions:

• the sampleMetadata.df I have in my code is indeed a data frame, I didnt mention this explicitly enough but it was never a dba object. This is why I shared the dba object so you can reproduce.
• The documentation for ?dba.peakset says for minOverlap: "the minimum number of peaksets a peak must be in to be included when adding a consensus peakset. When retrieving, if the peaks parameter is a vector (logical mask or vector of peakset numbers), a binding matrix will be retrieved including all peaks in at least this many peaksets. If minOverlap is between zero and one, peak will be included from at least this proportion of peaksets.". It doesnt mention anywhere here that this is only valid when either peaks or consensus is set, and it worked definitely like this for DiffBind v. 2.14. Maybe clarifying this in the R help then would be a good idea. If the behavior is undefined, it should rather throw an error and not fail silently as it does now - because 0 could be a "real" result, but often it is not. .

Again, thanks a lot for clarifying this here!

0
Entering edit mode

I agree the documentation could be clearer -- it took me a while to understand what it said so I could address your issue!