Question: DiffBind confuses coordinates of peaks
0
14 months ago by
gtechbio0
gtechbio0 wrote:

Hi All,

I'm analyzing ATAC-seq data and use MACS2 for peak calling and DiffBind for occupancy and affinity analysis.

In occupancy analysis when I try to find peaks that are unique to one group, it seems that DiffBind confuses the contigs and coordinates of the peaks. For example I get,

SU68  390602  392045  1444      * 2.191028e-03
SU39  136859  137210   352      * 2.175033e-03
SU79  564195  565710  1516      * 2.161226e-03

but contigs SU68, SU 39 and SU79 in reality are very short (ca 1kb each), so it is not possible that peaks are located at the coordinates that DiffBind displays.

MACS2 output files seem fine, so I guess this is either a bug in DiffBind or it somehow changes the coordinates of chromosomes. Any ideas?

dba.overlap(SU_vs_hybrid, SU_vs_hybrid$masks$Parent, mode = DBA_OLAP_RATE)
dba.overlap(SU_vs_hybrid, SU_vs_hybrid$masks$Hybrid, mode = DBA_OLAP_RATE)

SU_vs_hybrid_unique<-dba.peakset(SU_vs_hybrid, consensus = DBA_CONDITION,minOverlap = 0.33)

SU_vs_hybrid.OL<-dba.overlap(SU_vs_hybrid_unique,SU_vs_hybrid_unique$masks$Consensus)
only_parent_peaks<-as.data.frame(SU_vs_hybrid.OL$onlyA) Thanks Disclosure: Cross-posted to Biostars diffbind atacseq • 351 views ADD COMMENTlink modified 14 months ago by Rory Stark3.0k • written 14 months ago by gtechbio0 Hello- Can you email me your DBA object SU_vs_hybrid? I can have a look at what is going on. It would also help to know what version of DiffBind you are using (output of sessionInfo() -Rory ADD REPLYlink written 14 months ago by Rory Stark3.0k Hi Rory, Thanks for reply. I use DiffBind_2.4.8. If you don't mind, I will send the SessionInfo to your email (the forum has character limitations). ADD REPLYlink modified 14 months ago • written 14 months ago by gtechbio0 Answer: DiffBind confuses coordinates of peaks 1 14 months ago by Rory Stark3.0k CRUK, Cambridge, UK Rory Stark3.0k wrote: The issue is that when you convert a GRanges object to a data.frame, the seqnames field (which contains the chromosome or contig name) becomes a factor. If you try to look at them as numbers you get a factor number, which due to the sort order used in DiffBind, does not correspond directly to the chromosome/contig name. If you want to look at them as data.frames, you can use the levels() function to retrieve the corresponding character strings, or use the conversions built directly into DiffBind by setting DataType=DBA_DATA_FRAME, Here are some code snippets that show how this works. # Show the most distant peak on a contig allpeaks <- dba.peakset(SU_vs_hybrid, bRetrieve=TRUE, DataType = DBA_DATA_FRAME) contigs <- allpeaks[as.integer(allpeaks[,1]) > 18,] max(contigs[,2]) # Erroneous code that manually converts to data.frame then misuses factor numbers as contigs SU_vs_hybrid.OL <- dba.overlap(SU_vs_hybrid_unique, SU_vs_hybrid_unique$masks$Consensus) only_parent_peaks <- as.data.frame(SU_vs_hybrid.OL$onlyA)
contigs <- only_parent_peaks[as.integer(only_parent_peaks[,1]) > 18,]
max(contigs[,2])
# Correct code with internal data.frame conversion that uses character strings for contig names
SU_vs_hybrid.OL <- dba.overlap(SU_vs_hybrid_unique,
SU_vs_hybrid_unique$masks$Consensus,
DataType=DBA_DATA_FRAME)
only_parent_peaks<-SU_vs_hybrid.OL\$onlyA
contigs <- only_parent_peaks[as.integer(only_parent_peaks[,1]) > 18,]
max(contigs[,2])

Cheers-

Rory

Hi Rory,
I tried using DataType = DBA_DATA_FRAME, but the problem persists and I still get contigs with strange coordinates.