Binding matrix creation DiffBind after MACS2 peak call
1
0
Entering edit mode
Lauma R ▴ 10
@lauma-r-14513
Last seen 4.3 years ago
Cambridge

Hello,

I was looking at previous posts about MACS2 peak files and read shift before DiffBind to match MACS2 settings. However, I am still a bit confused about right steps to take. This is my work strategy:

1)  MACS2 was used for peak enrichment, with —shift -25 and —extsize 50 (thus shift 25b from 3' ->5' and after that set read length = 50)

2)  Load NarrowPeaks files in DiffBind and obtain consensus peakset (for all replicates from the same condition, saying that peak is "real" if it is in at least 0.5 samples)

3) Create master consensus peakset, this time from all consensus peaksets per phenotype

3) Now I want to create binding matrix for master consolidated peakset; This is the step I am confused about;

Should I shift my reads 25b 3' ->5' before I read them in DiffBind, just to match the MACS2 settings?

I am currently shifting my master peakset (GRanges) back 25b, than reading bed files with fragmentSize = 50, and than after binding matrix  is created, reseting it back to original location.

I would be very grateful for any advice of what is the right way to do this.

Kind Regards

diffbind macs2 • 614 views
0
Entering edit mode
Rory Stark ★ 4.6k
@rory-stark-5741
Last seen 10 days ago
CRUK, Cambridge, UK

You don't need to shift the consensus intervals. The MACS2 NarrowPeaks already respect the shift and report the "true" enrichment interval, which is maintained by DiffBind.

Two notes:

You may want to look at how you are using the --shift and --extsize parameters in MACS2. These are not based on the read length, but rather the insert size or actual mean length of DNA fragment in the experiment. MACS2 can compute this but I assume you are setting --nomodel as you are specifying an --extsize? Likewise, the fragmentSize you use when calling dba.count() is not related to the read length. Is there a reason you are overriding MACS2's capability to model the fragment size distribution?

Depending on what type of data you have (TF binding, histone marks, ATAC etc), you may also want to consider using the summits option in dba.count(), eg summits=100 will give you consensus intervals that are centered around the point of greatest enrichment and extend 100bp up and downstream.

0
Entering edit mode

Dear Rory,

MACS2  is set to  —nomodel.  We are working on ATACseq data and we went for these MACS2 parameters following the suggestion from other group which is working on the same type of data.