Question

DiffBind for ATAC-seq

0

Entering edit mode

igor ▴ 50

@igor

Last seen 13 months ago

United States

Normally, DiffBind is used for ChIP-seq data, but I've seen it used for ATAC-seq as well. I don't see any issues with that. MACS can be used for the initial peak calling. However, when performing peak calling for ATAC-seq data, it is advised to adjust the default parameters so peaks are centered on the cutting sites. A common suggestion is `--nomodel --shift -100 --extsize 200` or `--nomodel --shift 37 --extsize 73`.

If the reads are adjusted for peak calling, what happens when I then pass the peaks and the BAMs to DiffBind? Wouldn't DiffBind quantify the peaks incorrectly since the BAMs and peaks do not really represent the same data?

atac-seq diffbind macs2 macs peak finding • 9.4k views

ADD COMMENT • link updated 7.4 years ago by BioinfGuru ▴ 30 • written 7.9 years ago by igor ▴ 50

score 1 · Answer 1 · 2016-08-30

Hi,

Centering the peak on the cuttting site is good. I haven't looked specifically at ATAC data, but for what it's worth...

You can use the fragmentSize parameter of dba.count to ensure (mostly) that the reads overlap the cutting site. If the actual fragment length is on average, say, 300bp, then setting fragmentSize=300 will cause DiffBind to extend the reads (strand-specifically) to that length, so that each read *should* cover the true site.

Also, don't make the specified peak too narrow. Center it by all means, but if the peak is too narrow, then in a strong site, say, many reads may be a bit too far up- or downstream of the peak to be counted.

Cheers,

- Gord

score 1 · Answer 2 · 2017-03-17

1

Entering edit mode

BioinfGuru ▴ 30

@yagalbi-11519

Last seen 2 days ago

Ireland

I can't be too clear on this answer but as I understand it MACS2 was designed for chip-seq and as such will always "shift" the reads in the 5' prime direction due to some detail in the production of the chip-seq data . We use the option --shift 100 to move ATAC-seq reads in the opposite direction so that when MACS2 moves the reads.....they are returned to their original true position. So this should not be an issue for DiffBind. However, ATAC seq data is also adjusted for TN5 insertion of 9bp adapters....in this case the question posted is valid.... "If the reads are adjusted for peak calling, what happens when I then pass the peaks and the BAMs to DiffBind? Wouldn't DiffBind quantify the peaks incorrectly since the BAMs and peaks do not really represent the same data?"

For better detail and explanation about --shift:

https://groups.google.com/forum/#!topic/macs-announcement/4OCE59gkpKY

https://github.com/taoliu/MACS/issues/145

ADD COMMENT • link 7.4 years ago BioinfGuru ▴ 30

0

Entering edit mode

My understanding is that the "shift" in MACS2 is related to the modelled (or expected) fragment size (namely, half the mean fragment length). As Gord says above, extending the reads to the full fragment length (instead of using the read length) will cause single-end short reads to "overlap" features (peaks) not directly sequenced. Generally speaking, fragments in a sequencing library are longer than the reads, and also longer than regions of enrichment (peaks). Most reads that start 9bp from the actual site will overlap that site. So the overlap counts, while not perfect, should be a reasonable approximation of the "truth."

If there is concern about being more precise, the ATAC peaks can all be simply shifted 9bp (easy to do in R) before being processed by DiffBind. But keep in mind the goal of a differential analysis, which is relative and not absolute. We are attempting to define regions that likely have changed their enrichment levels systematically, which can be accomplished without having perfectly quantified every peak for every sample.

ADD REPLY • link 7.4 years ago Rory Stark ★ 5.2k