Question: DiffBind confuses coordinates of peaks
gravatar for gtechbio
12 months ago by
gtechbio0 wrote:

Hi All,

I'm analyzing ATAC-seq data and use MACS2 for peak calling and DiffBind for occupancy and affinity analysis.

In occupancy analysis when I try to find peaks that are unique to one group, it seems that DiffBind confuses the contigs and coordinates of the peaks. For example I get,


SU68  390602  392045  1444      * 2.191028e-03
SU39  136859  137210   352      * 2.175033e-03
SU79  564195  565710  1516      * 2.161226e-03


but contigs SU68, SU 39 and SU79 in reality are very short (ca 1kb each), so it is not possible that peaks are located at the coordinates that DiffBind displays.

MACS2 output files seem fine, so I guess this is either a bug in DiffBind or it somehow changes the coordinates of chromosomes. Any ideas?

Here is my code after loading the peaks:

dba.overlap(SU_vs_hybrid, SU_vs_hybrid$masks$Parent, mode = DBA_OLAP_RATE)
dba.overlap(SU_vs_hybrid, SU_vs_hybrid$masks$Hybrid, mode = DBA_OLAP_RATE)

SU_vs_hybrid_unique<-dba.peakset(SU_vs_hybrid, consensus = DBA_CONDITION,minOverlap = 0.33)



Disclosure: Cross-posted to Biostars

diffbind atacseq • 315 views
ADD COMMENTlink modified 11 months ago by Rory Stark2.8k • written 12 months ago by gtechbio0


Can you email me your DBA object SU_vs_hybrid? I can have a look at what is going on.

It would also help to know what version of DiffBind you are using (output of sessionInfo()


ADD REPLYlink written 12 months ago by Rory Stark2.8k

Hi Rory,
Thanks for reply. I use DiffBind_2.4.8.
If you don't mind, I will send the SessionInfo to your email (the forum has character limitations).

ADD REPLYlink modified 12 months ago • written 12 months ago by gtechbio0
Answer: DiffBind confuses coordinates of peaks
gravatar for Rory Stark
11 months ago by
Rory Stark2.8k
CRUK, Cambridge, UK
Rory Stark2.8k wrote:

The issue is that when you convert a GRanges object to a data.frame, the seqnames field (which contains the chromosome or contig name) becomes a factor. If you try to look at them as numbers you get a factor number, which due to the sort order used in DiffBind, does not correspond directly to the chromosome/contig name.

If you want to look at them as data.frames, you can use the levels() function to retrieve the corresponding character strings, or use the conversions built directly into DiffBind by setting DataType=DBA_DATA_FRAME,

Here are some code snippets that show how this works.

# Show the most distant peak on a contig
allpeaks <- dba.peakset(SU_vs_hybrid, bRetrieve=TRUE, DataType = DBA_DATA_FRAME)
contigs <- allpeaks[as.integer(allpeaks[,1]) > 18,]
# Erroneous code that manually converts to data.frame then misuses factor numbers as contigs
SU_vs_hybrid.OL <- dba.overlap(SU_vs_hybrid_unique,
only_parent_peaks <-$onlyA)
contigs <- only_parent_peaks[as.integer(only_parent_peaks[,1]) > 18,]
# Correct code with internal data.frame conversion that uses character strings for contig names
SU_vs_hybrid.OL <- dba.overlap(SU_vs_hybrid_unique,
contigs <- only_parent_peaks[as.integer(only_parent_peaks[,1]) > 18,]



ADD COMMENTlink modified 11 months ago • written 11 months ago by Rory Stark2.8k

Hi Rory,
Thanks for reply!
I tried using `DataType = DBA_DATA_FRAME`, but the problem persists and I still get contigs with strange coordinates.

ADD REPLYlink written 11 months ago by gtechbio0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 142 users visited in the last hour