Question

DiffBind for ATAC-seq -- bFullLibrarySize parameter

0

Entering edit mode

paz2005 • 0

@paz2005-19944

Last seen 6.1 years ago

I've been trying to use DiffBind for ATAC-seq data. I noticed that there is a large difference in differential accessibility depending on whether bFullLibrarySize is TRUE or FALSE. Specifically, in one case, when set to TRUE, I detect 16,271 differentially accessible peaks, 14,577 of which have a positive fold change; when set to FALSE, for the same contrast, I detect 17,867 differentially accessible peaks, 9,510 of which have a positive fold change. So, when set to TRUE, roughly 90% of the fold changes are unidirectional, whereas when set to FALSE, the fold changes are bidirectional and closer to 50/50. Is there some rational for how one should decide when bFullLibrarySize should be TRUE or FALSE? Since I think this is likely related to FRiP, I'll post the scores below.

ID Condition Replicate Caller Intervals FRiP

A-1 A 1 counts 71007 0.44

A-2 A 2 counts 71007 0.44

A-3 A 3 counts 71007 0.3

A-4 A 4 counts 71007 0.29

A-5 A 5 counts 71007 0.38

B-1 B 1 counts 71007 0.44

B-2 B 2 counts 71007 0.44

B-3 B 3 counts 71007 0.47

B-4 B 4 counts 71007 0.47

B-5 B 5 counts 71007 0.5

B-6 B 6 counts 71007 0.38

Thank you.

diffbind atac • 2.6k views

ADD COMMENT • link updated 6.1 years ago by Rory Stark ★ 5.2k • written 6.1 years ago by paz2005 • 0

score 2 · Accepted Answer · 2019-02-21

Do you have a prior expectation that most of the changes will be gains in one sample group over the other? Were the libraries sequenced to similar depths (total reads)? In these cases you should NOT change the setting to FALSE.

The issue is how the data are normalized. Setting bFullLibrarySize=FALSE uses a standard RNA-seq normalization method (based on the number of reads in consensus peaks), which assumes that most of the "genes" don't change expression, and those that do are divided roughly evenly in both directions. Using the default bFullLibrarySize=TRUE avoids these assumptions and does a very simple normalization based on the total number of reads for each library.

You can better see what is going on by running the analysis both ways, then comparing three plots using dba.plotMA(). First run the plot with bNormalized=FALSE and th=1. If you see a high density of points above or below the center zero line, the changes are mostly in one direction (or there was some systematic imbalance between groups in how the libraries were prepared and sequenced). Then compare this to the plot after running dba.analyze() with bFullLibrarySize=TRUE (making sure that bNormalized=TRUE in the call to dba.plotMA()). . You may some adjustment of the densest part of the cloud toward the center line. Try again after running with bFullLibrarySize=FALSE. The binding values will be normalized to be mostly centered on the zero fold-change line, which is why you see more changes in both directions.

Based on the information I have, you almost certainly don't want to set bFullLibrarySize=FALSE!