Question

Binding matrix created by DiffBind

0

Entering edit mode

sh.kazempour94 ▴ 10

@shkazempour94-21879

Last seen 13 months ago

United States

Hi bioconductor team,

I'm analyzing atac-seq data with 4 genotypes and two biological replicate for each of them (8 samples overall). I used macs2 callpeak for peak calling and then used the .bam files and .narrowPeak files to apply Diffbind and get the binding matrix. Also, I'm working with DiffBind_2.14.0. I need to mention that the bam files have gone through the quality control and reads with MAPQ<30 are removed.

I know that all my .narrowPeak files contain peaks in HOXa region (I annotated every single of them and hoxa regions overlap within all samples). However, I cannot find this region in the total binding matrix. I even set minOverlap=1 to get total number of unique peaks after merging overlapping ones, and then annotated all the regions but HOXa was not found.

So, I believe that DiffBind is filtering some of the regions somehow. My question is that what filtering steps the DiffBind is applying in order to generate the binding matrix?

dbaObj <-  dba(sampleSheet=cellSamples,peakCaller="narrow", 
     config=data.frame(doBlacklist=FALSE, doGreylist=FALSE, singleEnd=FALSE, yieldSize=5000000, 
                                            bUsePval=TRUE,th=0.01), scoreCol=5, filter=0)

When I annotated all the regions from dba$merged I could find Skap2 but not HOXa. Below is a screenshot from uploading bw files to igv, showing the peaks for Skap2 and hoxa. peaks from Skap2 vs. hoxa

Hoxa genes go from chr6:52,105,000-52,217,000, and this interval is available in all .narrowPeak files, but how come DiffBind remove it when making the binding matrix? If the dba() function does any filtering can you please explain it to me?

Many thanks,

Shiva.

ATACSeq DiffBind • 1.5k views

ADD COMMENT • link 2.8 years ago sh.kazempour94 ▴ 10

score 0 · Answer 1 · 2021-06-15

0

Entering edit mode

Rory Stark ★ 5.1k

@rory-stark-5741

Last seen 7 days ago

Cambridge, UK

You should not be accessing dba$merged directly; try retrieving the binding matrix using

merged <- dba.peakset(dbaObj, bRetrieve=TRUE)

and confirm the expected peaks are missing.

If they are still absent in the returned GRanges object, please email me a copy to your dbaObj (or link where I can access it) and I can see what is going on.

ADD COMMENT • link 2.8 years ago Rory Stark ★ 5.1k

0

Entering edit mode

Thank you for your response. Yes I found the regions related to "hoxa" when I used dba.peakset(). So why do the regions from dbaObj$binding and dba.peakset() differ? (the dim of both data frames are the same though)

table(merged$chr)

        chr1        chr10        chr11        chr12        chr13        chr14 
        3646         2742         3629         2273         2420         2189 
       chr15        chr16        chr17        chr18        chr19         chr2 
        2142         1897         2389         1648         1566         4260 
        chr3         chr4         chr5         chr6         chr7         chr8 
        2693         3126         3254         3045         3009         2597 
        chr9 chrUn_random         chrX 
        2808           46         1048 

table(binding$chr)

 chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19  chr2 chr20 
 3646  1648  1566  4260  2693  3126  3254  3045  3009  2597  2808  2742    46 
chr21  chr3  chr4  chr5  chr6  chr7  chr8  chr9 
 1048  3629  2273  2420  2189  2142  1897  2389

Ultimately I need the binding matrix to do DB analysis later. Based on my understanding, dba.count() uses dbaObj$binding matrix in order to count reads in binding site intervals, and the dbaObj$binding miss some regions. What is the reason for that? And which one is correct?

Thanks again.

ADD REPLY • link 2.8 years ago sh.kazempour94 ▴ 10

1

Entering edit mode

Everything that is supposed to be there is there! The internal $binding matrix uses a type of factor representation of chromosome names, so the chromosome numbered 6 is not "chr6". dba.peakset() with bRetrieve=TRUE takes care of converting chromosome name strings (with the help of the internal vector $chrmap).

ADD REPLY • link 2.8 years ago Rory Stark ★ 5.1k

0

Entering edit mode

Thank you for helping me figuring this out! The matrix from dba.peakset() seems to be the correct one. Thanks!

ADD REPLY • link 2.8 years ago sh.kazempour94 ▴ 10