DiffBind for ATAC-seq -- bFullLibrarySize parameter
1
0
Entering edit mode
paz2005 • 0
@paz2005-19944
Last seen 2.7 years ago

I've been trying to use DiffBind for ATAC-seq data. I noticed that there is a large difference in differential accessibility depending on whether bFullLibrarySize is TRUE or FALSE. Specifically, in one case, when set to TRUE, I detect 16,271 differentially accessible peaks, 14,577 of which have a positive fold change; when set to FALSE, for the same contrast, I detect 17,867 differentially accessible peaks, 9,510 of which have a positive fold change. So, when set to TRUE, roughly 90% of the fold changes are unidirectional, whereas when set to FALSE, the fold changes are bidirectional and closer to 50/50. Is there some rational for how one should decide when bFullLibrarySize should be TRUE or FALSE? Since I think this is likely related to FRiP, I'll post the scores below.

ID Condition Replicate Caller Intervals FRiP

A-1 A 1 counts 71007 0.44

A-2 A 2 counts 71007 0.44

A-3 A 3 counts 71007 0.3

A-4 A 4 counts 71007 0.29

A-5 A 5 counts 71007 0.38

B-1 B 1 counts 71007 0.44

B-2 B 2 counts 71007 0.44

B-3 B 3 counts 71007 0.47

B-4 B 4 counts 71007 0.47

B-5 B 5 counts 71007 0.5

B-6 B 6 counts 71007 0.38

Thank you.

diffbind atac • 1.5k views
2
Entering edit mode
Rory Stark ★ 4.1k
@rory-stark-5741
Last seen 23 days ago
CRUK, Cambridge, UK

Do you have a prior expectation that most of the changes will be gains in one sample group over the other? Were the libraries sequenced to similar depths (total reads)? In these cases you should NOT change the setting to FALSE.

The issue is how the data are normalized. Setting bFullLibrarySize=FALSE uses a standard RNA-seq normalization method (based on the number of reads in consensus peaks), which assumes that most of the "genes" don't change expression, and those that do are divided roughly evenly in both directions. Using the default bFullLibrarySize=TRUE avoids these assumptions and does a very simple normalization based on the total number of reads for each library.

You can better see what is going on by running the analysis both ways, then comparing three plots using dba.plotMA(). First run the plot with bNormalized=FALSE and th=1. If you see a high density of points above or below the center zero line, the changes are mostly in one direction (or there was some systematic imbalance between groups in how the libraries were prepared and sequenced). Then compare this to the plot after running dba.analyze() with bFullLibrarySize=TRUE (making sure that bNormalized=TRUE in the call to dba.plotMA()). . You may some adjustment of the densest part of the cloud toward the center line. Try again after running with bFullLibrarySize=FALSE. The binding values will be normalized to be mostly centered on the zero fold-change line, which is why you see more changes in both directions.

Based on the information I have, you almost certainly don't want to set bFullLibrarySize=FALSE!

0
Entering edit mode

Hi Rory, Thanks for posting the explanation above. It is very useful to state what are the assumptions for the bFullLibrarySize and what happen when it is set as TRUE and FALSE.

I'm just wondering about one thing: you mentioned:

Based on the information I have, you almost certainly don't want to set bFullLibrarySize=FALSE

On a related issue, I always wonder how people decide which normalization would be appropriate for their experiments especially if they see bias (high density of points) mostly in one direction. Does this mean systematic bias or biological global genome-wide changes?

Thank you!