DiffBind confuses coordinates of peaks
1
1
Entering edit mode
gtechbio ▴ 10
@gtechbio-13996
Last seen 10 months ago
Spain

Hi All,

I'm analyzing ATAC-seq data and use MACS2 for peak calling and DiffBind for occupancy and affinity analysis.

In occupancy analysis when I try to find peaks that are unique to one group, it seems that DiffBind confuses the contigs and coordinates of the peaks. For example I get,

 

SU68  390602  392045  1444      * 2.191028e-03
SU39  136859  137210   352      * 2.175033e-03
SU79  564195  565710  1516      * 2.161226e-03

 

but contigs SU68, SU 39 and SU79 in reality are very short (ca 1kb each), so it is not possible that peaks are located at the coordinates that DiffBind displays.

MACS2 output files seem fine, so I guess this is either a bug in DiffBind or it somehow changes the coordinates of chromosomes. Any ideas?

Here is my code after loading the peaks:

dba.overlap(SU_vs_hybrid, SU_vs_hybrid$masks$Parent, mode = DBA_OLAP_RATE)
dba.overlap(SU_vs_hybrid, SU_vs_hybrid$masks$Hybrid, mode = DBA_OLAP_RATE)


SU_vs_hybrid_unique<-dba.peakset(SU_vs_hybrid, consensus = DBA_CONDITION,minOverlap = 0.33)


SU_vs_hybrid.OL<-dba.overlap(SU_vs_hybrid_unique,SU_vs_hybrid_unique$masks$Consensus)
only_parent_peaks<-as.data.frame(SU_vs_hybrid.OL$onlyA)

Thanks

Disclosure: Cross-posted to Biostars

atacseq diffbind • 1.4k views
ADD COMMENT
0
Entering edit mode

Hello-

Can you email me your DBA object SU_vs_hybrid? I can have a look at what is going on.

It would also help to know what version of DiffBind you are using (output of sessionInfo()

-Rory

ADD REPLY
0
Entering edit mode

Hi Rory,
Thanks for reply. I use DiffBind_2.4.8.
If you don't mind, I will send the SessionInfo to your email (the forum has character limitations).

ADD REPLY
1
Entering edit mode
Rory Stark ★ 5.1k
@rory-stark-5741
Last seen 13 days ago
Cambridge, UK

The issue is that when you convert a GRanges object to a data.frame, the seqnames field (which contains the chromosome or contig name) becomes a factor. If you try to look at them as numbers you get a factor number, which due to the sort order used in DiffBind, does not correspond directly to the chromosome/contig name.

If you want to look at them as data.frames, you can use the levels() function to retrieve the corresponding character strings, or use the conversions built directly into DiffBind by setting DataType=DBA_DATA_FRAME,

Here are some code snippets that show how this works.

# Show the most distant peak on a contig
allpeaks <- dba.peakset(SU_vs_hybrid, bRetrieve=TRUE, DataType = DBA_DATA_FRAME)
contigs <- allpeaks[as.integer(allpeaks[,1]) > 18,]
max(contigs[,2])
# Erroneous code that manually converts to data.frame then misuses factor numbers as contigs
SU_vs_hybrid.OL <- dba.overlap(SU_vs_hybrid_unique,
                               SU_vs_hybrid_unique$masks$Consensus)
only_parent_peaks <- as.data.frame(SU_vs_hybrid.OL$onlyA)
contigs <- only_parent_peaks[as.integer(only_parent_peaks[,1]) > 18,]
max(contigs[,2])
# Correct code with internal data.frame conversion that uses character strings for contig names
SU_vs_hybrid.OL <- dba.overlap(SU_vs_hybrid_unique,
                               SU_vs_hybrid_unique$masks$Consensus,
                               DataType=DBA_DATA_FRAME)
only_parent_peaks<-SU_vs_hybrid.OL$onlyA
contigs <- only_parent_peaks[as.integer(only_parent_peaks[,1]) > 18,]
max(contigs[,2])

Cheers-

Rory

ADD COMMENT
0
Entering edit mode

Hi Rory,
Thanks for reply!
I tried using `DataType = DBA_DATA_FRAME`, but the problem persists and I still get contigs with strange coordinates.

ADD REPLY

Login before adding your answer.

Traffic: 749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6