Question

DiffBind dropping peaks in ATAC-seq data

0

Entering edit mode

Nebat • 0

@67938a65

Last seen 9 months ago

United States

Hi all,

I'm new to ATAC-seq analysis and have recently been trying to use DiffBind to systematically identify differential peaks that I've been seeing by eye when looking at macs2 output in IGV. I have two conditions in triplicate and have done combined macs2 runs on nucleosome-free regions for each condition as well as on the entire dataset. When looking in IGV I can identify peaks in the pooled dataset that are unique to one condition or the other but when I run DiffBind I only get a few or no peaks being called as differential depending on the parameters. It seems like I might only be getting hits for peaks that are present across both conditions where there's a significant difference in read counts within a peak but I'm not sure. Any tips or recommendations for analyzing this sort of ATAC-seq dataset using this tool would be greatly appreciated.

DiffBind ATACSeq • 806 views

ADD COMMENT • link updated 9 months ago by Malcolm Cook ★ 1.6k • written 9 months ago by Nebat • 0

score 0 · Answer 1 · 2023-07-26

It is difficult to know what exactly is going on without more information (script, output).

Generally, changes need to be consistent within the replicates for each sample group, and different between sample groups. The higher the variance within a sample group, the more replicates are required to be confident changes are real. Pooling samples can mask this variance so it is important to look at the counts for all of the replicates. You can retrieve the counts using the dba.peakset() function with bRetrieve=TRUE.

Feel free to send me your full DBA object and I can have a look.

score 0 · Answer 2 · 2023-07-27

Nice to see you have 3 replicates per condition.

Try this:

Create two peak-sets independently, using Genrich (or something like it) which appropriately handles Multiple replicates (instead of macs2, which doesn't). (note: If using Genrich, I recommend you first try using parameter a=0).

Combine the peak-sets into a single reference peak based on overlap and/or proximity. R/BioConductor's genomicRanges can make this easy, as can bedtools

Perform differential chromatin accessibility analysis on the reference peakset. I use csaw at this point, but DiffBind might perform well.