Question: DiffBind for ATAC-seq -- bFullLibrarySize parameter
gravatar for paz2005
4 weeks ago by
paz20050 wrote:

I've been trying to use DiffBind for ATAC-seq data. I noticed that there is a large difference in differential accessibility depending on whether bFullLibrarySize is TRUE or FALSE. Specifically, in one case, when set to TRUE, I detect 16,271 differentially accessible peaks, 14,577 of which have a positive fold change; when set to FALSE, for the same contrast, I detect 17,867 differentially accessible peaks, 9,510 of which have a positive fold change. So, when set to TRUE, roughly 90% of the fold changes are unidirectional, whereas when set to FALSE, the fold changes are bidirectional and closer to 50/50. Is there some rational for how one should decide when bFullLibrarySize should be TRUE or FALSE? Since I think this is likely related to FRiP, I'll post the scores below.

ID Condition Replicate Caller Intervals FRiP

A-1 A 1 counts 71007 0.44

A-2 A 2 counts 71007 0.44

A-3 A 3 counts 71007 0.3

A-4 A 4 counts 71007 0.29

A-5 A 5 counts 71007 0.38

B-1 B 1 counts 71007 0.44

B-2 B 2 counts 71007 0.44

B-3 B 3 counts 71007 0.47

B-4 B 4 counts 71007 0.47

B-5 B 5 counts 71007 0.5

B-6 B 6 counts 71007 0.38

Thank you.

diffbind atac • 78 views
ADD COMMENTlink modified 4 weeks ago by Rory Stark2.7k • written 4 weeks ago by paz20050
Answer: DiffBind for ATAC-seq -- bFullLibrarySize parameter
gravatar for Rory Stark
4 weeks ago by
Rory Stark2.7k
CRUK, Cambridge, UK
Rory Stark2.7k wrote:

Do you have a prior expectation that most of the changes will be gains in one sample group over the other? Were the libraries sequenced to similar depths (total reads)? In these cases you should NOT change the setting to FALSE.

The issue is how the data are normalized. Setting bFullLibrarySize=FALSE uses a standard RNA-seq normalization method (based on the number of reads in consensus peaks), which assumes that most of the "genes" don't change expression, and those that do are divided roughly evenly in both directions. Using the default bFullLibrarySize=TRUE avoids these assumptions and does a very simple normalization based on the total number of reads for each library.

You can better see what is going on by running the analysis both ways, then comparing three plots using dba.plotMA(). First run the plot with bNormalized=FALSE and th=1. If you see a high density of points above or below the center zero line, the changes are mostly in one direction (or there was some systematic imbalance between groups in how the libraries were prepared and sequenced). Then compare this to the plot after running dba.analyze() with bFullLibrarySize=TRUE (making sure that bNormalized=TRUE in the call to dba.plotMA()). . You may some adjustment of the densest part of the cloud toward the center line. Try again after running with bFullLibrarySize=FALSE. The binding values will be normalized to be mostly centered on the zero fold-change line, which is why you see more changes in both directions.

Based on the information I have, you almost certainly don't want to set bFullLibrarySize=FALSE!

ADD COMMENTlink written 4 weeks ago by Rory Stark2.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 204 users visited in the last hour